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ABSTRACT 


The  problem  of  state  estimation  for  a class  of  linear  discrete- 
time dynamical  systems  with  unknown  time-varying  parameters  is 
investigated.  Attention  is  focused  mainly  on  systems  with  unknown 
noise  statistics.  Two  different  approaches  to  modelling  and  estimation 
under  time-varying  uncertainties  are  investigated.  In  one  of  the 
approaches,  a finite  state  Markov  chain  model  is  used  for  the  jump 
parameters  which  can  take  values  only  from  a finite  set  with  transitions 
from  one  value  to  another  determined  by  a Markov  transition  probability 
matrix.  The  transition  matrix  may  or  may  not  be  known;  if  unknown,  it 
is  assumed  to  belong  to  a finite  set.  A Bayes  optimal  solution  is 
obtained  in  a recursive  form  and  several  suboptimal  algorithms  are 
discussed  to  alleviate  the  large  storage  and  computation  requirements 
of  the  optimal  estimator.  The  asymptotic  behavior  of  the  optimal 
solution  for  the  case  of  unknown  transition  probabilities  is  analyzed. 
Numerical  examples  are  presented  in  support  of  the  analyses.  In  the 
second  approach,  multiple  bounds  on  the  unknown  parameter  values  and  its 
time  derivatives  are  assumed  to  be  available.  A detection-estimation 
approach  is  proposed  for  state  estimation  and  its  asymptotic  behavior  is 
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analyzed.  The  main  objective  of  the  proposed  approach  Is  to  reduce  the 
pessimism  of  the  standard  minimax  estimator  for  large  observation  records 
and  large  uncertainties,  while  retaining  its  desirable  small-sample 
properties.  The  performance  of  the  scheme  is  illustrated  by  means  of 
simulation  examples. 
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CHAPTER  1 

INTRODUCTION 


1.1.  General 

This  thesis  is  concerned  with  the  estimation  problem  for  a class  of 
linear  discrete-time  dynamical  signal  models  with  unknown  time-varying 
parameter  values.  The  problem  of  estimating  the  state  variables  of  a 
dynamical  system,  given  noisy  observations  of  the  output  variables,  is  of 
fundamental  importance  in  control  and  communication  theory.  In  control 
problems  the  final  goal  is  often  to  design  control  strategies  for  a par- 
ticular system,  which  requires  the  availability  of  the  state  of  the 
controlled  plant.  In  communication  theory  problems,  the  primary  interest 
is  estimation  of  the  state  of  the  system  itself. 

When  the  models  for  the  signal  and  the  noise  are  completely  specified, 
it  is  possible,  at  least  theoretically,  to  obtain  optimal  solutions  to  the 
problem  of  state  estimation  under  various  optimality  criteria  [65,28]. 

The  problem  is  considerably  more  difficult  when  uncertainty  exists  regard- 
ing the  system  parameters,  the  system  model,  or  the  noise  statistics; 
especially  if  the  uncertain  quantities  are  time-varying.  An  extensive 
amount  of  work  has  been  done  on  the  problem  of  state  estimation  and 
identification  for  stationary  Gaussian  signal  models.  A popular  model  is 
the  state  space  representation  where  a linear  time- invariant  system  is 
driven  by  and  observed  in  white  Gaussian  noise.  The  main  advantage  of 
this  model  is  that  it  is  analytically  tractable  while  also  being  not  too 
far  removed  from  "physical  reality". 


In  this  thesis  we  are  concerned  with  a class  of  linear  discrete-time 


dynamical  stochastic  systems  with  time-varying  parameters.  Attention  is 
focused  mainly  on  systems  with  unknown  time-varying  noise  statistics. 

The  motivations  for  such  an  assumption  are  as  follows.  First,  there  exists 
a large  class  of  physical  phenomena  which  can  be  modeled  in  terms  of  linear 
systems  driven  by  (and/or  observed  in)  noise  with  (unknown)  time-varying 
statistics.  Second,  quite  often  we  are  interested  in  approximating  a 
higher  order  system  with  a lower  order  model  (due  to,  e.g.,  limitations  on 
available  computation  and  storage  capabilities),  or  in  approximating  a 
nonlinear  model  with  its  linearized  equivalent.  The  error  in  modeling 
can  then  be  thought  of  as  a noise  process  (possibly  white  and/or  Gaussian) 
whose  statistics  might  change  with  time  depending  upon  the  approximation 
of  the  chosen  model  to  the  original  system.  The  objective  then  becomes 
how  to  choose  the  noise  statistics  to  make  the  approximation  best  possible 
in  some  sense.  We  also  address  the  problem  of  state  estimation  under  un- 
certain observations.  The  unknown  parameters  will  be  assumed  to  switch 
among  a set  of  finite  number  of  values;  hence  the  use  of  the  terms  jump 
or  switching  parameters. 

Two  different  approaches  to  state  estimation  for  systems  with  jump 
parameters  are  investigated.  In  one  of  the  approaches,  the  jump  parameters 
are  assigned  a probabilistic  description.  We  assume  that  the  parameters 
can  take  values  only  from  a finite  set,  with  transitions  from  one  value 
to  another  determined  by  a Markov  transition  probability  matrix.  The  set 
of  values  taken  by  the  parameters  is  assumed  to  be  known.  However,  the 
transition  matrix  may  be  unknown  and  is  assumed  to  belong  to  a finite 
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set.  A Bayesian  approach  for  optimal  state  estimation  may  be  appropriate 
here.  The  second  approach  is  used  only  for  the  case  of  unknown  time-vary- 
ing noise  statistics.  It  is  assumed  that  a statistical  description  is 
not  appropriate  in  this  case;  rather  multiple  bounds  (disjoint  or  nested) 
are  assumed  to  be  available  on  the  unknown  parameter  values  and  its  time 
derivative (s ) . A combined  detection-estimation  approach  is  appropriate 
here,  and  is  based,  to  an  extent,  on  the  approach  given  in  [45,46].  This 
approach  is  essentially  a heuristic  extension  of  the  standard  minimax 
scheme  to  the  case  when  multiple  bounds  on  the  unknown  parameters  are 
available. 

The  model  of  linear  discrete-time  dynamical  system  with  jump 
parameters  can  be  useful  in  many  problems  of  interest.  The  Markov  jump 
process  can  be  used  as  a model  for  periodic  but  random  step  changes  in 
system  parameters  (or  noise  statistics)  where  the  measurement  of  such 
changes  is  not  feasible,  either  for  technical  or  economic  reasons.  A 
major  advantage  of  this  model  is  that  it  allows  some  nonstationary 
environments  with  random  properties  to  be  treated  by  techniques  similar 
to  those  used  for  stationary  Gaussian  disturbances.  To  further  motivate 
the  choice  of  this  switching  model,  consider  the  following  example  with 
switchings  in  the  noise  statistics.  Let  a hypothetical  submarine  be 
constrained  to  randomly  maneuver  only  in  depth.  As  a first  approximation 
assume  that  the  target  (the  submarine)  maneuvers  only  at  discrete  times, 
known  to  the  tracking  vehicle.  The  submarine  will  have  a continuous 
range  in  which  to  choose  its  next  depth.  In  modelling  the  variations 
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in  depth  to  which  the  submarine  can  descend,  certain  discrete  depths 

(states)  d.,d_,...,d  are  chosen.  These  n states  act  as  mean  values 
i t n 

around  which  the  depth  of  the  target  randomly  varies.  Statistically 
modelling  the  maneuvering  target,  we  assume  that  the  transitions  among 
the  n states  can  be  described  by  a Markov  chain.  Both  rapid  and  slow 
maneuvering  of  the  target  can  be  modelled  well  by  suitably  selecting  the 
transition  matrices.  Such  a model  for  maneuvering  targets  has  been  used 
in  [62, 70] . Moose  [43]  used  a more  general  model  of  a semi-Markov 
process.  In  [ 12] , switchings  in  the  driving  noise  covariance  are  used 
to  model  variable  maneuverability  of  an  airborne  target.  The  Markov 
jump  process  model  may  also  be  used  in  problems  such  as  estimation  for 
systems  subject  to  possible  component/subsystem  failures  [1,2,69],  and 
certain  control  applications  [1,48,59-61].  We  also  note  that  the  detection- 
estimation  approach  may  be  used  in  tl.e  case  of  above-mentioned  examples 
if  a statistical  description  of  the  switching  parameters  is  either 
inappropriate  or  unavailable. 

Finally,  the  important  case  of  systems  with  uncertain  observations  may 
be  modelled  by  expressing  the  interrupted  observation  mechanism  in  terms 
of  a two-state  Markov  chain  [44,51-53],  Such  a situation  may  be  caused 

by  an  intermittent  failure  in  the  observation  mechanism.  It  may  also  occur 

l 

in  a tracking  situation  when  sensor  returns  may  originate  from  something 
other  than  a single  object  being  tracked,  such  as  other  objects  being 
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tracked,  new  objects  not  yet  being  tracked,  false  alarms,  clutter,  and 
radio  frequency  interference  [56],  Furthermore,  there  are  situations 
where  the  data  acquisition  process  is  often  interrupted  physically  or 
artificially;  and  a continuous  supply  of  information  about  the  state  of 
the  system  is  impossible.  For  example,  there  exists  a class  of  control 
systems  where  the  observations  are  not  available  at  every  instant  due 
to  either  the  physical  impossibility  or  the  cost  in  taking  observations 
[4]. 

1.2.  Historical  Survey 

As  remarked  earlier  an  extensive  amount  of  work  has  been  done  on  the 
problem  of  state  estimation  and  identification  for  stationary  Gaussian 
models.  When  the  signal  and  the  noise  models  are  completely  known, 
well-known  solutions  exist  [28,65].  A tutorial  introduction  to  the 
problem  of  identification  for  stationary  Gaussian  models  is  available  in 
the  recent  text  by  Goodwin  and  Payne  [22].  An  extensive  survey  of  the 
work  in  identification  may  be  found  in  Astrom  and  Eykoff  [6] ; brief 
surveys  are  available  in  Mehra  [38]  and  Pearson  [47] . More  recent 
results  may  be  found  in  [40]  and  [1974  Dec.,  IEEE  Transactions  on  Automatic 
Control  - Special  Issue] . 

Existing  approaches  to  state  estimation  for  stationary  Gaussian 

i 

models  with  unknown  parameters  can  be  roughly  classified  into  two  major 
categories:  minimax  estimation  [17,47,55]  and  adaptive  estimation 
[6,7,34,22,38,39].  A minimax  solution  to  an  estimation  problem  minimizes 
a given  cost  function  for  the  worst  case  values  of  the  unknown  para- 
meters which  are  assumed  to  belong  to  a compact  set.  The  minimax  schemes 
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yield  estimates  which  are  too  conservative  when  large  observation  records 
are  available  but  are  robust  for  small  samples.  Also,  if  other  than  the 
worst  case  values  of  the  parameter  occur,  the  performance  of  the  minimax 
estimator  usually  deteriorates.  A number  of  approaches  can  be  taken  to 
adaptive  estimation.  Some  of  these  are:  Bayesian  estimation,  maximum  likeli- 
hood (ML)  estimation,  correlation  methods,  covariance  matching  techniques, 
and  prediction  error  methods.  In  the  Bayesian  estimation  schemes  [24,34,36], 
the  unknown  parameters  are  characterized  as  random  variables  with  known 
probability  distribution  and  then  the  Bayesian  minimum  mean-squared  error 
(MMSE)  state  estimate  is  found.  In  the  maximum  likelihood  estimation 
schemes,  first  the  unknown  parameters  are  estimated  using  the  maximum  likeli- 
hood techniques  and  then  these  estimates  are  used  in  the  proper  state 
estimator  algorithm.  Such  schemes  have  good  large  sample  properties  in  that 
the  state  estimate  converges  to  the  optimal  estimate  (corresponding  to  the 
completely  known  model  case)  as  the  length  of  the  observation  record  tends 
to  infinity.  The  Bayesian  scheme  also  has  this  large  sample  property  in 
addition  to  also  being  optimal  at  every  time  instant.  For  small  samples,  the 
maximum  likelihood  schemes  usually  yield  state  estimates  having  large  error 
covariances.  The  other  adaptive  schemes  mentioned  before  are  quite  similar 
to  the  maximum  likelihood  schemes;  they  differ  from  the  ML  techniques  only 
in  that  different  techniques  are  used  to  estimate  the  unknown  parameter  values. 

Both  the  above-mentioned  classes  of  approaches  are  applicable  mainly 
when  the  uncertainties  are  time-invariant.  To  extend  these  techniques  to 
the  time-varying  uncertainties  case,  one  must  provide  some  description  of 
how  the  parameters  vary  with  time.  For  example,  in  [11],  it  is  assumed 
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that  a state-space  model,  driven  by  white  noise,  of  the  nonstationary 
covariance  parameters  is  available.  Similar  assumptions  have  been  made 
in  [9].  In  [7],  a fading  memory  filter  is  proposed  to  track  the  time 
variations  in  the  unknown  noise  covariances.  All  these  extensions  are 
quite  ad-hoc,  and  need  on-line  "tuning."  Finally,  a combined  detection- 
estimation  approach  has  also  been  used  [31,41,21,46,55].  It  is  particularly 
suited  to  cases  where  information  regarding  the  uncertainties  is  available 
in  the  form  of  bounds  (multiple  hypotheses).  So  far,  this  approach  has 
been  found  to  be  intractable  for  the  case  of  time-varying  uncertainties. 

One  of  the  objectives  of  this  thesis  is  to  investigate  alternative 
models  for  characterizing  time-variations  in  the  system  parameters.  The 
objective  is  to  choose  a model  which  is  general  enough  to  describe  a large 
class  of  time-varying  characteristics  of  the  system  parameters,  while, 
at  the  same  time,  is  also  analytically  and  computationally  tractable. 

The  Markov  jump  process  is  one  such  model  for  time-varying  parameters, 
and  is  a statistical  description  of  the  time-variation.  The  bounding 
sets  on  the  values  of  unknown  (time -varying)  parameters  and  their 
derivative(s)  represent  a deterministic  model. 

The  problem  of  state  estimation  in  switching  environments  (i.e., 

Markov  dependent  switchings  in  noise  statistics)  when  the  transition 
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probability  matrix  is  known,  has  been  treated  in  [1,2,67]  and  several 

other  papers.  They  differ  primarily  in  the  types  of  approximation  to  the 

optimal  solution,  which  is  difficult  to  compute  because  of  exponentially 

increasing  computation  and  storage  requirements  with  time.  The  problem 


of  state  estimation  for  systems  with  interrupted  observations  has  been 
treated  in  [26,27,44,51-53].  It  is  similar  to  state  estimation  in 
switching  environments.  Previous  work  on  state  estimation  in  switching 
environments  has  not  investigated  the  problem  of  unknown  transition 
matrices,  a much  less  general  version  of  the  problem  was  considered  by 
Kashyap  [31].  He  considered  identification  of  the  transition  matrix  of  a 
finite  state  Markov  chain  from  zero-memory,  scalar,  nonlinear  observations 
corrupted  by  additive  white  noise.  Sawaragi  et.  al.  [51,52]  were  the  first 
to  address  the  problem  of  state  estimation  for  linear  discrete-time 
systems  with  stationary  interrupted  observation  mechanism  where  the 
statistics  of  the  interruption  process  are  unknown  but  fixed.  In  [5l] , 
the  interruption  process  was  assumed  to  be  an  independent  binary  sequence. 
In  [52],  a more  general  model  of  Markov  interruption  process  was  treated. 

In  both  these  papers  a Bayesian  viewpoint  was  adopted  and  the  objective 
was  to  present  a feasible  adaptive  algorithm  that  sequentially  produced 
an  approximate  minimum  variance  estimator.  The  asymptotic  convergence 
of  the  a posteriori  probability  of  the  transition  probabilities,  given 
the  past  observations,  was  demonstrated  through  simulations.  No 
theoretical  justification  was  provided. 

Additional  comments  regarding  the  previous  work  may  be  found  in  the 

# I 

sequel.  In  the  next  section  we  formulate  the  class  of  problems  discussed 


in  this  thesis. 
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1.3.  Problem  Statement 

In  this  thesis,  two  main  classes  of  problems  are  considered.  In  the 
first  one,  the  switching  parameters  are  statistically  modelled  while  In  the 
second,  multiple  bounds  on  the  values  of  the  unknown  time-varying 
parameters  and  their  time-derivative(s)  are  considered. 

The  system  considered  is  modelled  by  the  state  equation 

xk+l“Axk  + Bwk 

and  Is  observed  in  additive  noise  as 

zk“Ckxk  + vk*  k-0,1,...  (1.2) 

Given  that  x^  is  n-dimensional  system  state  vector  at  time  instant  k; 
z^  is  m-dimensional  measurement  vector;  Xq  ~ N(Xq,Pq);  A and  B are  m x n 
and  n X r constant  matrices,  respectively;  and  is  m x n matrix.  The 
sequences  [w^]  and  [v^)  are  r"  and  m-dimensional  random  sequences, 
respectively. 

In  the  sequel  based  on  the  two  approaches  mentioned  above,  the 
following  three  cases  are  considered;  the  first  two  in  conjunction  with 
the  Bayesian  approach  and  the  third  in  conjunction  with  the  detection- 
estimation  approach. 

t i 

Case  1:  In  this  case,  * C for  all  k,  where  C is  an  m x n constant 
matrix.  The  statistics  of  the  random  sequences  {w^}  and  [v^]  are  time- 
varying  and  are  governed  by  one  of  S multivariate  Gaussian  distributions. 
The  transitions  among  the  distributions  are  Markov,  with  transition 
probabilities  given  by  an  S x S transition  matrix  tt.  Each  allowable 
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Gaussian  distribution  is  such  that 


- V(1) 


oo»<»<l>,.«>)  ■0<1>8kl6u 

cov<vk1>,vJ1>)  -R<1)6kj5u 

cov(w^\v^^)  » 0 V j»k,i,i;  i,/ 
k j 


1,2, ...  ,S 


The  superscripts  refer  to  a specific  Gaussian  distribution  from  the  set 
of  S allowable  distributions.  It  should  therefore  be  noted  that  all  the 
expectations  given  above  are  conditional  on  specific  states  of  the  Markov 
chain  governing  the  transitions. 

The  transition  probability  matrix  rr  may  or  may  not  be  known.  If 
unknown,  it  is  assumed  to  belong  to  a finite  set  which  also  contains  the 
true  transition  matrix  rr  ; that  is,  tt,tt.  € 0*  where  Q1  * [tt.  ,tt, , . . . ,ttu}  . 
Furthermore,  it  is  assumed  that  the  a priori  distribution  on  the  tt's 
in  Cl'  is  known. 

Notice  that  if  the  environment  (defined  by  a specific  Markov  chain 
state  sequence)  is  not  known,  the  system  noise  and  the  measurement  noise 
will  not  be  independent.  Furthermore,  vk+^  will  not  be  independent  of 
Xj^  since  w^  and  v^+^  are  correlated;  w^  influences  the  determination 
of  However,  when  the  environment  is  known,  the  dependence  on 

environment  can  be  removed  to  give  independent  Gaussian  noise  in  both 
the  system  and  the  measurement  equations. 
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Case  2:  In  this  case,  ■ y^C  where  C Is  an  m x n constant  matrix. 

(Vk3  Is  a stationary  Markov  chain  that  takes  on  values  of  0 or  1,  and 
Is  called  the  Interruption  process  in  the  sequel.  The  sequences  [w^] 
and  {vk}  are  zero-mean  white  Gaussian  processes  with  constant  covariances 
Q and  R,  respectively,  where  R is  assumed  to  be  a positive  definite 
symmetric  matrix.  The  random  sequences  wk>  vk  and  y^  are  mutually 
Independent  and  are  also  Independent  of  Initial  system  state  x^. 

Let  tt  be  the  transition  probability  matrix  of  the  interruption 
process  with  elements 

pij  “ Ptyk  “ ilvk_i  " J3  " 0,1. 

It  should  be  noted  that  the  transition  matrix  tt  is  characterized  by  two 
values  Pqq  and  p^.  It  is  assumed  that  the  transition  matrix  tt  is  an 
unknown  constant  and  can  take  values  only  from  a finite  set 
O'  - {tt.  ,tt«  , . . . ,tt„}  , which  also  contains  the  true  transition  matrix  tt  . 
Furthermore,  the  a priori  distribution  on  the  tt's  is  assumed  to  be  known. 
Case  3:  In  this  case,  Ck  * C for  all  k,  where  C is  an  m x n constant 
matrix.  The  sequences  [wk}  and  {vk]  are  independent,  white,  zero-mean 
Gaussian  processes  with  covariances  Qk  and  Rk,  that  are  positive 
semi-definite  and  positive  definite,  respectively.  The  given  model  is 

i 

specified  up  to  a set  of  unknown  parameters,  possibly  time-varying, 
denoted  by  the  vector  9(k)  (at  time  k)  of  dimension  q.  This  uncertainty 
may  be  in  Qk  or  Rk  only,  which  are  assumed  to  be  continuous  functions 
of  9(k).  The  unknown  parameter  vector  9(k)  is  assumed  to  satisfy  at 
least  one  of  the  following  conditions 


H|paaM 
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e(k)  eni,  i - 1,2,. ...n 

where  Cl i ■ 1,2,...,N  is  a collection  of  compact  subsets  of  Bq.  The 

sets  may  be  disjoint  or  nested.  Furthermore,  the  sets  0^  may  specify 

bounds  on  values  of  9 (k)  and  its  time -derivative (s ) . 

The  main  objective  in  all  the  three  cases  is  to  obtain  an  estimate 

of  the  value  of  x^  from  the  past  observations  , 0<  j < k}. 

1.4.  Thesis  Outline 
— 

In  the  following  chapters  we  study  the  problem  of  state  estimation  for 
a class  of  linear  discrete-time  dynamical  systems  with  unknown  time-varying 
parameters.  Two  different  approaches  to  the  problem  are  investigated.  In 
Chapters  2 through  4 the  jump  parameters  are  assigned  a probabilistic 
description.  In  Chapters  5 and  6 multiple  bounds  on  the  unknown  covariances 
and  their  time  derivatives  are  assumed  to  be  available. 

In  Chapter  2 we  use  a finite  state  Markov  chain  model  for  the  jump  para- 
meters. The  probability  transition  matrix  is  assumed  to  be  known.  Both  optimal 
MMSE  state  estimator  and  suboptimal  approximations  to  it  are  considered  in  this 
chapter.  In  Chapter  3 the  transition  matrix  is  no  longer  assumed  to  be  known. 
Attention  is  confined  to  jumps  in  noise  statistics  only.  Both  optimal  and 
suboptimal  Bayesian  adaptive  state  estimators  are  investigated.  Considerable 
attention  is  paid  to  the  asymptotic  behavior  of  the  optimal  adaptive  scheme. 

In  Chapter  4 we  consider  the  problem  of  adaptive  state  estimation  for  systems 
under  uncertain  observations  with  unknown  statistics. 
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In  Chapters  5 and  6 a detection-estimation  scheme  is  investigated 
for  state  estimation  when  information  regarding  the  uncertainties  is 
assumed  to  be  available  in  the  form  of  multiple  bounds.  In  Chapter  5 
we  consider  only  time -invariant  uncertainties  in  the  noise  covariances. 
In  Chapter  6 the  unknown  noise  covariances  are  allowed  to  be  time- 
varying.  Finally,  the  work  is  summarized  in  Chapter  7. 


i 


A 


■ 
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CHAPTER  2 

ESTIMATION  FOR  SYSTEMS  WITH  MARKOVIAN  SWITCHING  PARAMETERS 

2.1.  Introduction 

In  this  chapter  we  consider  the  problem  of  state  estimation  for 
Cases  1 and  2 of  model  (1.1)-(1.2)  (see  Section  1.3).  The  transition 
probability  matrix  tt  is  assumed  to  be  known.  In  order  to  avoid  cumbersome 
notational  problems,  attention  is  confined  mainly  to  state  estimation 
in  switching  environments,  i.e..  Case  1;  although  the  results  are 
applicable  to  more  general  models  including  Case  2. 

Ackerson  and  Fu  [l]  appear  to  have  been  the  first  to  treat  the  problem 
of  state  estimation  in  switching  environments.  They  noted  that  in  order 
to  evaluate  the  minimum  mean-squared  error  (MMSE)  estimate  of  the  system 
state,  the  computational  requirements  increase  exponentially  with  time, 
which  makes  the  optimal  scheme  impractical.  They  then  proposed  a sub- 
optimal  algorithm  assuming  that  the  a posteriori  probability  density  of 
the  state  given  the  past  observations  is  Gaussian.  An  entirely  different 
approach  was  taken  by  H.  Akashi  and  H.  Kumamoto  [2],  In  their  algorithm, 
the  set  of  Markov  chain  state  sequences  which  characterize  the  MMSE 
estimate  is  regarded  as  a population  and  the  estimate  is  calculated 
with  a relatively  small  number  of  sequences  sampled  at  random  from 
the  population.  (Other  approximations  given  in  [13,51,67]  are  closely 
related  to  the  approach  of  Ackerson  and  Fu  [l]. 


ft 


The  suboptimal  estimator  of  [ l]  is  rather  ad-hoc.  Its  main 
justification  is  that  the  resulting  algorithm  is  simple.  Moreover,  its 
performance  is  usually  good.  However,  simulation  examples  are  available 
in  [2]  which  indicate  that  this  approach  does  not  always  work  so  well. 
The  random  sampling  approach  of  [2]  is  a better  approximation  to  the 
optimal  estimator.  However,  it  can  be  computationally  demanding  as  a 
fairly  large  number  of  "samples"  (of  the  state  sequences)  are  needed  (as 
in  any  random  sampling  scheme).  In  this  chapter  we  investigate  a 
detection-estimation  type  of  approach  which  is  intuitively  and  mathe- 
matically more  pleasing  than  the  approach  of  Ackerson  and  Fu  [1],  while 
also  being  computationally  less  demanding  than  the  random  sampling 
approach  of  [2]. 

In  Section  2.2,  an  MMSE  estimator , optimal  in  the  Bayesian  sense, 
is  derived.  The  proposed  detection-estimation  approach  is  described  in 
Section  2.3.  Simulation  results  are  given  in  Section  2.4  comparing 
the  three  approaches — pseudo-Bayes  [ l] , random  sampling  [2],  and 
detection-estimation. 

2.2.  Optimal  Estimator 

We  consider  Case  1 of  model  (1.1)-(1.2).  The  transition  matrix  tt 
is  assumed  to  be  known.  Let  Z^  = [z^,  0 4 i s k}  denote  the  collection 
of  past  observations  at  time  k.  The  optimal  estimate  x(k|k)  of  the 
system  state  x^  is  defined  to  be  any  function  of  which  minimizes  the 
mean-square  error.  It  is  well  known  that  the  MMSE  estimator  is  given  by 


the  conditional  mean 
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*<k|k)  = E{xjz,} 


(2.1) 


Define  a Markov  chain  state  sequence  I(k)  as 


I(k)  - { ig> ij » • • • » i^J  » i-j  £ 


(2.2) 


where  i^  represents  the  Markov  chain  state  at  time  j;  i.e.,  it  is  the 
index  of  the  distribution  from  which  w^  and  v are  sampled.  Let  the 
estimate  conditional  on  a specific  sequence  of  states  be  denoted  by 


x.(k|k)  = E[xk|zk,Ij(k)] 


(2.3) 


where  I^(k)  is  a specific  sequence  from  the  space  0^  of  the  sequences 

k+1 

I(k).  Note  that  the  space  contains  S elements. 

On  using  the  preceding  definitions  one  obtains 


x(k|k)  = E x (k|k)P(I  (k)|Z  ) 


(2.4) 


where  P(Ij(k)|Zk)  denotes  the  conditional  probability  of  the  sequence 
Ij(k)  given  the  observations  Z^.  The  application  of  Bayes'  rule  further 
yields 


p<yk>izk>  ■ -kir 

O 


f(zklii(k),zk_1)P(ii(k)lzk_1) 


(2.5) 


f('klI/<k)>zk-i)p<I«<k,!zk-i) 


where  f (zk| I (k) ,Zk_^)  is  the  conditional  density  of  the  k-th  observation 
sample  zk  given  the  earlier  observations  Zk  and  the  particular  Markov 


chain  sequence  I^(k).  Furthermore,  we  have 


1 


17 


P(Ij(k)|zk-1)  = p(IJj(k"l)|zk.1)P(ik  = ik_!  = n) 


(2.6) 


where  m,  n and  the  sequences  I.(k)  and  I (k-1)  are  defined  by  the  relations 

J " 


Ij(k)  = [I£(k-l),ik  = m} 

1£ (k-1 ) — {io>ii>**’»^k-2’ ^k-1  ~ 


(2.7) 

(2.8) 


and  P(ik  = m| ik_^  = n)  = probability  that  the  noise  sample  at  time  k is 
from  the  m distribution  given  that  the  noise  sample  at  time  k-1  is  from 
the  nt'1  distribution. 

Now  f (z, | I.(k),Z,  .)  and  x.(k|k)  can  be  calculated  recursively  by 
K j K-i  J 

applying  Kalman  filtering  methods  to  the  system  model  (1.1)-(1.2).  The 
initial  condition  Xq  has  a Gaussian  distribution.  Given  the  sequence 
lj(k),  the  system  (1.1)-(1.2)  is  now  a linear  system  with  Gaussian  noise. 
Therefore,  equations  for  x^(k|k)  are  those  for  a Kalman  filter  [l], 
given  by 

Xj  (k|  k)  =■  AXj(k-l|k-l)  +BM.(ni)  + K (k)[zk-CAx^  (k-l|  k-l)-CBp. (m)  - v(m)]  (2.9) 

(2.10) 

M (k)  = AP  (k-l)AT  + BQ(m)BT  (2.11) 

J ** 


Kj  (k)  = Mj(k)CT[CMj(k)CT  + R(m)]“l 


P.(k)  = Mj(k)  - Kj (k)CMj (k) 


(2.12) 


Note  that  Pj(k)  is  the  conditional  covariance  of  the  filtered  estimate, 
while  Mj(k)  is  the  conditional  covariance  of  the  one-step  predicted 
estimate.  From  (2.7)  and  (2.8)  it  is  clear  that 
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f(xk-l'Ij(k)>Zk-l)  " f(xk-l|li(k‘l)»Zk-l)  ~ N(xA(k-l|k-l),PA(k-l))  (2.13) 
which  further  implies 

f(rk!lj(k),Zk_1)  ~ NfC(Ax£(k-l|k-l)  + B^(m))  + v (m) .CMj (k)CT-m(m)}  (2.14) 

where  N(n,,D)  is  the  standard  notation  for  normal  density  with  mean  p,  and 
covariance  matrix  D.  The  expressions  (2.3)-(2.14)  yield  the  recursive 
optimal  MMSE  estimate  x(k|k). 

It  is  obvious  from  Equation  (2.4)  that  the  optimal  estimator  requires 

an  exponentially  increasing  memory.  Specifically,  at  the  kth  step,  Sk+^ 

Kalman  filters  are  run  to  compute  Xj(k|k).  Moreover,  calculation  of 

P(Ij (k) | Z^)  requires  knowledge  of  Sk  probabilities  P(I^(k-l) |zk_^).  So 

the  procedure  given  in  this  section,  while  optimal,  is  not  very  practical. 

To  circumvent  this  difficulty,  various  approximations  to  the  optimal  MMSE 

estimator  have  been  proposed  in  the  literature  [1,2,13,67].  The  sub- 

optimal  estimator  of  [l],  called  the  pseudo-Bayes  approximation,  is 

summarized  in  Appendix  A.  The  random  sampling  approach  [2]  is  discussed 

in  Appendix  B.  In  the  next  section,  we  discuss  a new  approximation  to 

the  optimal  estimator. 

2.3.  Detection-Estimation  Approach 

In  this  section,  we  discuss  a new  approximation  to  the  optimal  state 

estimator  in  switching  environments.  Recall  that  in  the  optimal  solution, 

k+1 

at  stage  k,  we  need  S Kalman  filters  matched  to  all  possible  Markov 

k+1 

chain  state  sequences  Ij(k),  j ■1,2,...,S  . The  objective  now  is  to 


. m . it  ... 


control  the  number  of  Kalman  filters  to  be  a finite,  and  below  a 
maximum  allowable,  number.  Instead  of  carrying  the  probabilities 
P(Ij(k)|z^)  in  (2.4)  for  all  j,  we  disregard  some  of  the  "unlikely" 
sequences,  i.e. , we  do  not  process  the  Kalman  filters  matched  to 
these  unlikely  sequences. 

Consider  the  conditional  probability 


P(I.(k)|Zk)  = f(Zk|l.(k))P(Ij(k))/f(Zk), 


(2. 


in  which  the  denominator  is  common  for  all  values  of  j , so  that  only  the 
numerator  need  be  considered.  Given  Ij(k),  the  conditional  density 
f (Zk| Ij (k) ) is  a Gaussian  density  function  and  can  easily  be  computed 
recursively  using  Kalman  filtering  methods  [l].  Let  M denote  the 
maximum  number  of  state  sequences  (or  equivalently,  Kalman  filters)  to 
be  processed  at  any  stage  k.  Suppose  that,  at  stage  k-1,  we  have 
selected  Nk_^  state  sequences  (Nk_^  < M).  At  stage  k,  consider  all 
possible  "extensions"  of  these  Nk_^  state  sequences.  By  an  extension 
I(k)  of  a state  sequence  I^(k-l)  we  mean 


I(k)  = [i  (k-l).i(k)  = m], 


m a 1,2,... ,S, 


so  that  S extensions  of  I^(k-l)  are  possible.  Out  of  these  Nk_^  X S 
state  sequences,  we  retain  Nk_^  sequences  which  satisfy 
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i Zk  " zj(klk_1>ll  p~ 1 ( k | k-l;z)  “ P 


(2.16) 


where 


and 


Zj(k|k-1)  - prediction  of  based  on  the  observation  Zk_^ 
assuming  an  extension  (indexed  by  j)  of  one  of 
the  ^ sequences  to  be  true 

Pj(k|k-l;z)  * corresponding  prediction  error  covariance 
0 ■ a positive  number. 


In  other  words,  let  Ij (k)  be  an  "extended"  sequence  where 
j = 1,2, . . . , (Nk_^  X S).  Then 


and 


Zj(k|k-1)  =•  E[zk|Zk_1,Ij(k)} 

P (k|k-l;«)  - E[[zk  - 2j(k]k-l)][zk  - Zj (k| k-l)]T| I (k)} 


Note  that  if  the  assumed  state  sequence  I^(k)  is  indeed  true,  then  the 
left-hand  side  of  (2.16)  is  a chi-square  random  variable  with  m degrees 
of  freedom  where  m is  the  dimension  of  zk#  Hence  0 can  be  selected  to 
correspond  to  a fixed  probability  of  rejecting  the  correct  state  sequence 
(the  probability  of  a "miss").  Thus  the  criterion  (2.16)  results  in  the 
selection  of  Nk_^  "most  likely"  state  sequences.  Note  that  the  test  (2.16) 
represents  the  requirement  that  a candidate  state  sequence  should  lie 
in  the  ellipsoid  of  a given  probability  concentration. 


|zk-zj(k|k-l)rp-l(k|k.1.z)  ft  (zk-zj(k|k-l))Xp‘1(k|k-l;z)(zk-2j(k|k-l)) 
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Now  if  ^ M,  we  may  carry  along  all  these  state  sequences  and 

set  = N^  But  if  M < N^  then  M "most  likely"  candidate  sequences  are 
selected  according  to  the  following  procedure.  Arrange  f (ZjJ  Ij(k))P(I^(k)), 
j = 1,2,...,N^  in  decreasing  order  of  magnitude  and  select  the  state 
sequences  I^(k)  corresponding  to  the  first  M probabilities  f (ZjJ  1^ (k) )P( 1^ (k) ) 
as  the  M candidate  sequences  to  be  carried  forward,  resulting  in  Nfc  ■ M. 

Equation  (2.4)  is  now  modified  as 


x(k|k)  - Zk  x (k|k)P(I  (k)|Z  ) 
j-1  1 J 


(2.17) 


where 


f(Z  |l  (k))P(I  (k)) 

p(yk),zk)  --hH 1 — 


(2.18) 


Z f(Zji  (k))P(i,(k)) 


j 


Notice  that  the  subscript  j now  indexes  the  selected  N^  state  sequences. 

A further  refinement  of  this  procedure  may  be  required.  It  is  possible 

that  two  state  sequences  which  differ  in  "early"  states  (and  have  the  same 

"recent"  states),  may  merge  into  each  other  in  that  the  predictions 

z j (k| k-1)  based  on  either  sequence  are  almost  the  "same".  Under  such  a 

situation,  there  is  no  point  in  carrying  along  both  these  sequences.  We 

3hall  regard  two  state  sequences  I (k)  and  I (k)  as  distinguishable  only 

J 

if  the  Bhattacharyya  distance  (B-distance)  [29]  between  the  conditional 
probability  density  functions  f (zk|Zk_^,Ij (k))  and  f ( z ^ | Zk_^, (k))  is  more 
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than  some  real  number  a.  Out  of  the  two  sequences  I^(k)  and  I^(k)  f°r 
which  the  B-distance  d(I  (k),I  (k))  < a,  we  choose  to  keep  that  sequence 

J ** 

for  which  the  corresponding  prediction  error,  in  the  sense  of  (2.16),  is 
smaller.  Then,  in  (2.18),  the  probability  P(I^(k))  is  appropriately 
modified  by  "absorbing"  the  probability  of  the  dropped  sequence. 

The  motivation  for  calling  the  proposed  scheme  a detection-estimation 
approach  should  be  apparent  by  now.  A fixed  number  of  most  likely  state 
sequences  (detection)  characterize  the  state  estimate  (estimation) . 
Selection  of  M 

Recall  that  M is  the  maximum  allowable  number  of  state  sequences  to  be 
carried  forward  at  any  stage  k.  Consider  the  problem  where  the  probability 
of  no  jump  from  a given  state  to  any  other  Markov  chain  state  equals  p and 
is  the  same  for  all  the  states.  Then  the  probability  of  a jump  (to  any 
other  state)  equals  q = 1-p.  Furthermore,  let 

N = "block"  length  over  which  processing  is  to  be  done 
■ k+1  at  stage  k 

Then,  average  number  of  jumps  in  the  block  ■ Nq,  with  standard  deviation  = 

k k 

(Nqp)  . Therefore,  number  of  jumps  < Nq  + 3(Nqp)^  with  probability  > 0.99, 

if  N is  large.  Now,  let 

J * number  of  jumps  in  a given  block  length 

L = number  of  different  state  sequences  of  length  N 

Then,  with  probability  > 0.99,  we  have 
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L<  S[  2 (N'1)(S-Dk]  (2.19) 

k-0 

where  S is  the  number  of  states  in  the  Markov  chain.  The  term  inside  the 
square  brackets  in  (2.19)  gives  the  maximum  number  of  different  sequences 
with  the  "first"  state  fixed  and  with  number  of  jumps  < J.  As  examples 
consider  the  following  cases: 

Example  2.1:  S ■ 2,  N * 10,  q = 0.1,  p = 0.9 
Then  J ~ 4 

Therefore,  L < 512  as  opposed  to  2^  sequences  in  the  optimal  estimator. 
Example  2.2:  S ■ 2,  N = 10,  q = 0.01,  p ■ 0.99 
Then  J ~ 1 and  L < 20  as  opposed  to  2^°  sequences. 

Example  2.3;  S = 2,  N = 10,  q = 0.99,  p = 0.01 

Define  J = number  of  "no"  jumps  in  block  length  N 

Also,  let  q'  * p and  p'  * q,  and  use  q'  and  p'  in  (2.19) 

instead  of  q and  p.  Then,  J ~ 1 and  L < 20  as  in  Example  2.2. 

Notice  that  the  given  procedure  is  useful  only  in  the  extreme  cases 
of  very  high  or  very  low  jump  probabilities  as  in  Examples  2.2  and  2.3. 
Otherwise  the  choice  of  M is  usually  dictated  by  the  computation  and 
storage  capabilities  of  the  processor. 

2.4.  Numerical  Example 

In  this  section,  a simulation  example  is  presented  to  compare  the 
three  (suboptimal)  approaches  to  state  estimation  in  switching  environments  - 
pseudo-Bayes  (that  of  [1]),  random  sampling,  and  detection-estimation.  The 
example  has  been  taken  from  [2], 
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Consider  a scalar  dynamical  system  described  by  the  following  equations: 


Xk+1  " l-°*  Xk  + wk 


+ K(ik)vk  a^k^ ’ k = 0,1,2,.., 


S - 2,  i.e.,  ik  6 {1,2}. 


Initial  conditions  are:  x^  ~ N(30,400).  In  the  actual  system  we  use 
Xq  * 1 for  simulation  purposes,  [w^]  and  {v^}  are  zero-mean  white  Gaussian 
sequences  with  covariances  Q = 0.1  and  R = 1.0,  respectively.  The 
transition  probabilities  are  given  by 


p<lk+i-1l1k-1>-°-85 
p<1k+i-2l1k-1>-°-15 
p<1k+i  ■ 2>lk  ’ 2)  ■ °-3 

P<1kfl  - l|ik  - 2)  ■ 0-7 

Furthermore,  P(1q  = 1)  = P(ig  = 2)  = 0.5.  The  three  suboptimal  estimators 
were  simulated  and  their  performances  are  compared  in  Figs.  2.1  and  2.2. 

The  performances  were  averaged  over  50  Monte  Carlo  runs  (same  as  in  [2]). 

For  Fig.  2.1,  K(l)  = 40,  K(2)  - 1,  a(l)  = a(2)  = 0.  For  Fig.  2.2,  K(l)  - 40, 
K(2)  = 1,  a(l)  = 50,  a(2)  = 0.  For  both  these  figures,  the  random  sampling 
approach  was  applied  using  100  samples  of  state  sequences  (as  in  [2]) 
whereas  the  detection-estimation  approach  was  applied  using  M = 20.  That  ' 
is,  100  filters  were  needed  for  the  random-sampling  approach  whereas  only 
20  were  used  for  the  detection-estimation  approach.  Furthermore,  for  the 
detection-estimation  scheme,  6 ■ 6.63  in  (2.16)  which  corresponds  to  a 
"miss"  probability  of  0.01.  The  B-distance  feature  was  not  incorporated. 


RMS  Error 


svA 


P Si 


R 

A p A .0 
' ’/”* 


_ v 

o—o  Pseudo  Bayes 

a— a Detection-Estimation  (Proposed) 

^ — v Random  Sampling 


V-/ 

0 

10 

20 

30 

rr 

k 

FP-6097 

Figure  2.1. 

1 . i 

Comparison  of  the  RMS  errors  in 
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Figure  2.2.  Comparison  of  the  RMS  errors  in  s.tate  estimates  due  to  the 
suboptimal  estimators,  for  K(l)  ■ 40,  K(2)  ■ 1,  a(l)  ■ 50, 
a(2)  - 0. 
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From  both  these  figures,  it  is  clear  that  both  the  random-sampling 
approach  and  the  detection-estimation  approach,  are  superior  to  the 
pseudo-Bayes  approximation.  The  performance  of  the  random-sampling  and 
the  detection-estimation  approaches  are  quite  close  to  each  other. 

However,  from  a computational  viewpoint,  the  detection-estimation  approach 
is  preferable.  It  should  be  noted  that  a random  sampling  scheme  has  to 
use  a fairly  large  number  of  samples  in  order  for  the  results  to  have 
some  statistical  significance. 

The  results  of  more  extensive  simulations  are  shown  in  Table  2.1  for 

parameter  values  K(l)  ■ 40,  K(2)  * 0,  a(l)  ■ 50,  a(2)  = 0.  The  combined 

ensemble-  and  time-averages  of  r.m.s.  errors  due  to  the  three  suboptimal 

estimators  are  compared.  The  r.m.s.  error  in  the  state  estimate  was 

first  averaged  over  50  Monte  Carlo  runs,  then  it  was  averaged  over  30 

stages.  Let  D£  denote  the  combined  ensemble-  and  time-average  of  r.m.s. 

error  in  the  state  estimate,  and  let  D (k)  denote  the  ensemble -average 

e 

(over  50  Monte  Carlo  runs)  of  r.m.s.  error  at  time  k.  Then  we  have 


1 29 

D‘  • 55  k!0  Vk>- 

Table  2.1  shows  the  average  r.m.s.  error  Dt.  For  the  detection-estimation 
approach,  the  maximum  allowed  number  M of  the  Kalman  filters  was  varied 
from  5 to  20  in  steps  of  5.  For  the  random  sampling  approach,  the 
number  of  sequences  sampled  were  varied  from  10  to  100.  It  is  clear  from 
the  table  that,  for  a given  degree  of  accuracy,  the  detection-estimation 
approach  is  computationally  superior  to  the  random  sampling  approach. 
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TABLE  2.1 


Average  r.m.s.  error  D£  for  the  three  suboptimal  algorithms. 


Scheme 

Average  r.m.s.  Error  Dt 

Pseudo-Bayes 

28.5814 

Detection-Estimation 

with  M = 20 

17.2223 

15 

17.2217 

10 

17.1503 

5 

17 . 7486 

Random  Sampling  with 

Number  of  Sequences 

Sampled  * 100 

17.2256 

60 

17.5082 

40 

17.6131 

20 

18.4488 

10 

19.7364 
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2.5.  Discussion 

A combined  detection-estimation  type  of  approach  has  been  proposed 
for  state  estimation  in  linear  dynamical  systems  operating  in  switching 
environments.  Simulation  results  were  presented  which  indicated  the 
advantages  of  the  proposed  approach  over  the  pseudo-Bayes  and  the 
random  sampling  approaches. 

We  note  that  the  proposed  approach  can  easily  be  generalized  for 
state-estimation  in  a larger  class  of  linear  discrete-time  systems, 
namely,  linear  systems  with  Markovian,  jump  parameters  where  jumps  may 
occur  in  noise  statistics  or  in  matrices  A,  B and  C.  Also,  the  matrices 
A,  B and  C may  be  taken  to  be  time-varying. 
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CHAPTER  3 

ADAPTIVE  ESTIMATION  IN  SWITCHING  ENVIRONMENTS 


3.1.  Introduction 

In  the  previous  chapter  we  considered  the  problem  of  state  estimation 
in  switching  environments  under  the  assumption  that  the  transition  probabil- 
ity matrix  was  known.  This  restriction  is  removed  in  this  chapter.  We 
assume  that  the  transition  matrix  is  unknown  and  can  take  values  only  from 
a finite  set,  which  also  contains  the  true  value  of  the  transition  matrix. 
Previous  work  on  state  estimation  in  switching  environments  has  not 
investigated  the  problem  of  unknown  transition  statistics  considered  above. 

A Bayesian  approach  is  adopted  for  the  solution  of  the  problem.  The 
objectives  are  to  present  a feasible  adaptive  algorithm  that  sequentially 
produces  an  approximate  MMSE  estimator,  and  to  investigate  the  asymptotic 
behavior  of  the  Bayes  optimal  adaptive  estimation  scheme.  Specifically, 
conditions  are  investigated  under  which  the  constrained  maximum  likelihood 
estimate  of  the  transition  matrix  converges  in  probability  to  the  correct 
value  of  the  transition  matrix.  Then  convergence  of  the  a posteriori 
probability  of  the  transition  matrix  given  the  past  observations  is 
obtained.  Some  results  on  the  convergence  of  the  performance  of  the 
optimal  adaptive  scheme  are  also  presented. 
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In  Section  3.2  an  MMSE  estimator,  optimal  in  the  Bayesian  sense,  is 
derived.  Suboptimal  estimators  are  discussed  in  Section  3.3.  In 
Section  3.4  conditions  for  the  convergence  of  the  estimates  (maximum 
likelihood  and  Bayes)  of  the  transition  matrix  are  discussed.  Convergence 
in  performance  is  given  in  Section  3.5.  An  example  to  illustrate  the 
convergence  results  is  given  in  Section  3.6. 

Note:  In  the  following  sections,  the  distinction  between  a random  vector 

and  its  particular  realization  (value)  will  not  (always)  be  made  in  the 

notation,  rather  it  will  be  left  to  context.  Also,  unless  otherwise 

Z 

indicated,  the  probability  measure  used  throughout  is  P which  is  a 

nt 

probability  measure  induced  on  the  range  space  of  the  random  sequence 

Z = fz.,...,z  } parameterized  by  the  correct  value  tt  of  the  transition 
n L 0 n t 

matrix  tt. 

3.2.  Optimal  Estimator 

We  consider  Case  1 of  model  (1.1)-(1.2).  The  transition  matrix  tt 
is  unknown  and  can  take  values  only  from  a finite  set  {tt^  ,tt^  , . . . .rr^}  . 

Let  P(tt^)  = p{tt  = tt^}  denote  the  a priori  probability  of  the  qC^  transi- 
tion matrix.  The  problem  is  to  obtain  an  estimate  of  the  state  x^  after 
each  observation  and  to  investigate  its  asymptotic  behavior. 

Let  Z^  = {z.  , 0<  i<  k}  denote  the  collection  of  past  observations 
at  time  k.  The  optimal  estimate  x(k| k)  of  the  system  state  x^  is  defined 
to  be  any  function  of  Z^  which  minimizes  the  mean-square  error.  It  is 
well  known  that  the  MMSE  estimator  is  given  by  the  conditional  mean 


x(k|k)  = E C xk  I Z R} 


(3.1) 
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i m 1 


Let  x(k|k;rr^)  = E(xk|Zk,TT^}  denote  the  conditionally  optimal  estimate  of 

x^  given  and  Z^.  Let  P(tt  |Zk)  denote  the  conditional  probability  of 

tt  given  data  Z,  . Then  we  have 
q K 

M 

E 

q=l 


x(k|k)  = E x(k]  kjTT  )P(tt  | Z,  ) 

q q it 


(3.2) 


The  use  of  Bayes'  rule  results  in 


P<VV 


f (ZklTTq)P(TTq) 

E f(Zk|n  )P(n  ) 
q=l  n n 


(3.3) 


where  f(Z^|n^)  is  the  conditional  density  function  of  Z^  given  tt  = tt^. 

It  should  be  noted  that  x(k|k;n^)  has  already  been  derived  in  Section  2.2, 
and  parts  of  the  derivation  are  repeated  here  due  to  changes  in  notation 
caused  by  tt  . Let  the  Markov  chain  state  sequence  be  denoted  by  I(k)  and 
let  Ij(k)  denote  a specific  such  sequence  as  in  Equation  (2.7).  Similarly, 
the  estimate  conditional  on  a specific  sequence  of  states  is  denoted  by 


Xj(k|k;nq)  ^ E{xk| ,Zk, 1^ (k)} 


E[xklZk*Ij(k)) 


x (k| k) 


(3.4) 


It  is  seen  that  the  estimate  does  not  depend  on  the  transition  matrix  tt  . 

q 

On  using  the  preceding  definitions  one  obtains 


x(k|  k;Trq) 


E x (k|k)P(I  (k) | Z tt  ) 

J j q 


(3.5) 


A I U 


I 


where  0^  is  the  space  of  all  sequences  I(k).  The  application  of  Bayes' 


rule  further  yields 


p<yk>izk’V  ’ 


t(ZklI3<k)’Zk-l>P(It<lt)lZk-f"g) 


(3.6) 


Zml  £<zklIi<k),Zk-l)P(Ii<k)lZk-l’,,q) 


where  we  write  f (z^j 1^ (k) .Zk_^)  for  f (zk|  1^  (k)  .Z^^tt^)  as  the  latter 
does  not  depend  on  n . Furthermore,  we  have 

P(IJ(k)|Zk_1,TTq)  = P(X£(k-l>  |Zk_1,TTq>P(lk=m!  ik_X“^»TTq)  (3.7) 

where  I, (k), I (k-1),  m and  n are  as  defined  in  (2.7)-(2.8)  and 
J 

P(ik"Blik-l“n  ,tt  ) = probability  that  the  noise  sample  at  time  k is  from 
the  mth  distribution  given  that  the  noise  sample  at  time  k-1  is  from  the 
nC^  distribution  and  given  that  tt  * rr  . 

Now  f(zj  (k) »zk_j_)  and  Xj(k|k)  can  be  calculated  recursively  by 
applying  Kalman  filtering  methods  to  the  system  model  (1.1)-(1.2),  as  in 
Equations  (2.9)-(2.14). 

It  remains  to  calculate  P(n  |Z.  ) to  complete  calculation  of  x(k|k). 

q k 

We  have 


f<zkl" ,>  • 

^ n=0 


(3.8) 


where 


£<’0lyz-i> 4 *<*0lv 
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and 


f<2k,zk-rV 


2 f(z  1 1 (k),Z )P(I  (k)|n  ,Z  ) 

I.(k)€nk  k J 1 J q k 1 


(3.9) 


Expressions  for  quantities  on  the  right  side  of  Equation  (3.9)  have  already 

been  obtained  in  recursive  form.  Hence  f ( Z ^ | tt^ ) can  be  calculated 

recursively  for  each  n . Then  (3.3)  may  be  used  to  compute  P(tt  |z,  ) for 

q q K 

each  tt  . 

q 

As  pointed  out  earlier  in  Section  2.2  and  as  is  obvious  from  Equation 
(3.5),  the  optimal  estimator  requires  exponentially  increasing  storage 
(memory)  and  computation  capabilities.  In  the.  next  section  we  discuss 
approximations  to  the  optimal  estimator  in  order  to  reduce  the  large 
storage  and  computation  requirements. 

3.3.  Approximations  to  the  Optimal  Estimator 

In  this  section  we  briefly  discuss  several  approximations  to  the 
optimal  estimator  in  order  to  reduce  the  large  storage  and  computational 
requirements  of  the  optimal  estimator.  These  approximations  are  based  on 
those  outlined  in  Chapter  2. 

We  want  to  develop  approximations  to  the  optimal  estimator  given  by 
(3.2).  Now  any  of  the  three  approximations  (random  sampling,  pseudo-Bayes, 
detection-estimation)  discussed  in  Chapter  2 may  be  used  to  approximate 
x(k|k;TTq),  given  by  Equation  (3.5),  for  a fixed  value  of  tt.  So  the 
procedure  has  to  be  repeated  M times  to  get  approximations  to  x(k|k;n^) 
for  each  admissible  tt  , q = 1,2,...,M.  It  remains  to  specify  corresponding 

q 

approximations  to  P(n  |Z,  ) given  by  (3,3),  in  order  to  complete  the 

q x 

calculation  of  x(k|k)  given  by  (3.2).  These  approximations  are  discussed 


-aft 


below; 
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Random  Sampling:  In  this  method  f(Z,  |tt  ) in  (3.3)  is  approximated 
by  the  denominator  of  Equation  (B.6)  ($ee  also  Equation  (B.8)  and 
Property  P3  in  Appendix  B)  for  a fixed  value  tt  . The  procedure  may  then 
be  repeated  for  each  admissible  n , q = 1,2,..., M.  This  computation  is  a 
by-product  of  the  calculation  performed  to  obtain  x(klk;n  ). 

q 

Pseudo-Bayes : In  this  case  we  use  Equations  (3.10)  (given  below) 

and  (3.8)  to  obtain  an  approximation  to  f(Z,  Jtt  ) for  a fixed  tt  . Equation 

K q q 

(3.10)  is  a pseudo-Bayes  approximation  (see  Appendix  A)  to  (3.9) 


f ^Zkl2k-l,TTq^  = 2 f(zjJ  ik=^,Zk-l,TTq^P^ik*^lZk-l’7Tq^ 


(3.10) 


q 2=1  K'  K K-l'  q'  ' k K-l*  q 

The  pseudo-Bayes  approximations  to  the  quantities  on  the  right-hand  side  of 
(3.10),  for  a fixed  n , are  given  in  Appendix  A.  Again  these  computations 
are  by-products  of  the  calculations  performed  to  obtain  x(k|k;TTq). 

Detection-Estimation : In  this  approach  the  denominator  of  Equation 

(2.18)  is  taken  to  be  an  approximation  to  f(ZjJn^)  for  a fixed  rr^.  Once 
again  this  computation  is  a by-product  of  the  calculation  for  obtaining 
x(k|k;nq). 

Other  approximations  based  on  those  given  in  [27]  and  [67]  may  also  be 
used.  These  are  closely  related  to  the  pseudo-Bayes  approximation. 

3.4.  Convergence  of  Transitions  Matrix  Estimates 

In  this  section,  the  conditions  under  which  the  a posteriori 
probability  of  the  true  value  tt  of  the  transition  matrix  converges 
(weakly)  to  1 are  investigated.  Note  that  the  solution  given  in  Section 
3.3  is  optimum  in  the  Bayes  sense,  at  every  step  k,  and  the  optimality, 
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therefore,  is  independent  of  convergence.  It  is,  however,  of  interest  to 

know  conditions  under  which  the  solution  converges  to  one  that  uses  the 

correct  transition  matrix  value,  since  it  makes  the  asymptotic  solution 

independent  of  the  a priori  probabilities  P(tt^). 

The  results  are  developed  via  weak  consistency  of  maximum  likelihood 

(ML)  estimate  of  tt.  The  constrained  maximum  likelihood  estimate  of  tt 

given  observation  Z is  defined  as  tt(Z  ) which  satisfies  the  equation 
n n 


f(Z  |tt(Z  ))  = max  f(Z  | tt)  , * = [tt.  , . . . ,ttm3  • 

n n n(sQ.  n>  1 M 

A sequence  of  estimates  {tt(Z^),  1 < n < •}  is  said  to  be  weakly  consistent 
(or  consistent  in  probability),  if  for  any  positive  6,  e,  there  exists  an 
N(e,5)  < • such  that  for  n > N(e  ,6  ) P{d(TT(Zn),TTc)  > 6 3 < e where  d is  an 
appropriate  distance  function  and  rrt  is  the  correct  value  of  tt. 

The  following  conditions  are  imposed: 

(Hi)  The  state  transition  matrix  A in  the  system  model  (1.1)-(1.2)  has 
all  its  eigenvalues  inside  the  unit  circle. 

(H2)  The  noise  covariance  matrices  defined  in  Section  1.3  are  positive 
definite.  That  is,  > 0,  > 0;  i = 1,2,...,S,  where  B > 0 

implies  that  B is  a positive-definite  matrix.  A consequence  of  this 
is  that  there  exist  upper  bounds  QU  and  RU,  and  lower  bounds  QL 
and  RL,  on  and  R^\  respectively,  such  that 

QU-Q(1)  2 0,  RU-R(i)  20,  i - 1,2,... ,S 

QL-Q(i)  < 0,  RL-R(i)  < 0 

QL,RL  > 0. 


-*  . 
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Here  B s 0 implies  that  B is  a positive-semidef inite  matrix  and 
B < 0 implies  that  B is  negative-seraidef inite. 

(H3)  The  dynamical  system  (1.1)-(1.2)  is  completely  observable  and 
completely  controllable  [28]  when  the  driving  noise  w^  and  the 
measurement  noise  v^  are  independent,  white  Gaussian  sequences 
with  "constant"  covariances  QL  and  RL,  respectively.  Hence,  the 
same  holds  when  w^  and  v^  have  covariances  QU  and  RU,  respectively, 
for  all  k. 

(H4)  The  homogeneous  Markov  chain  governed  by  the  transition  matrix  tt, 
tt  € . . • ,ttm3  , is  irreducible  and  the  states  of  the  chain 

are  aperiodic  [19]. 

(H5)  Suppose  that  the  system  (1.1)-(1.2)  begins  at  initial  time  k = kQ. 
Let  Ij(n,k)  represent  a specific  sequence  from  the  set  of  Markov 


chain  state  sequences  I(n,k)  ^ [i^,i^+^, . . . ,i^] , n s k, 

"n,k 


j ■ l,2,...,Sn  . Let  Zn  ^ = [zk,kQ  < k < n]  and  the  following 

o 

is  assumed: 


lim  P(I.(n,k)  tt  ,Z 


k - -a 
o 


3 


q’  n,k 


) = P(I,(n,k)|rr  ,2  ) a.e.  P , 


j 


q n. 


TT* 


where  n,k  > -aa  are  fixed,  and  P is  a probability  measure  induced 


rr 


on  the  range  space  of  the  random  sequence  {...,z.,...,z  } 

k n 

parameterized  by  the  correct  value  of  the  transition  matrix  tt. 
(Note  that  here  we  are  using  the  same  notation  for  a random 
variable  and  its  values.) 
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(H6)  The  inequality 


lim  f(2JZn-l’V  + lim  f(ZnlZn-l’TTt) 

holds  with  nonzero  probability,  for  all  tt  € [tt,  , . . . ,ttw}  , tt  t tt  , 

q 1 M q t 

where  Z = [z,,  0<  ks  n}  . 
n k 

Remark  3.1;  A consequence  of  condition  (H4)  is  that  lim  tt11  = tt  exists 

n ->  ® 

and  is  unique,  i.e,,  there  exists  a unique  stationary  distribution  for 
the  Markov  chain  corresponding  to  tt.  Then  lim  = independent  of  j, 

where  p^k  is  the  transition  probability  from  state  j to  state  k in  n 
steps  [19].  Here  a^  is  the  steady-state  probability  of  the  Markov  chain 
being  in  state  k.  Further,  the  distribution  of  the  entire  (Markov  chain) 
process,  with  arbitrary  initial  distribution,  converges  to  the  distribu- 
tion of  the  stationary  process  with  transition  matrix  tt  and  initial 

00 

distribution  given  by  a^'s  (see  Proposition  7.12  in  Breiman  [10]). 

The  following  theorem  states  the  weak  consistency  of  the 
constrained  ML  estimate  of  tt. 

Theorem  3.1:  Suppose  that  conditions  (H1)-(H6)  hold.  Suppose  that  the 
noise  sequences  [w^]  and  {v^}  in  model  (1.1)-(1.2)  have  zero  means.  Then 
the  constrained  maximum  likelihood  estimate  tt(Z  ) of  tt  converges  in 
probability  to  tt  as  n — «. 

Tse  and  Anton  [63]  have  given  sufficient  conditions  for  the  weak 
consistency  of  ML  estimate  to  hold.  We  shall  show  that  these  conditions  are 
satisfied  for  system  model  (1. 1)— (1.2)  under  the  conditions  (Hl)-(H6). 


Several  preliminary  results  are  needed  first. 
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Suppose  Chat  a linear  discrete-time  system  is  described  by 

Xk+1  ' Axk  + Bwk 
Zk  “ Cxk  + Vk 

wk~  N[0,Q(k)],  vk~  N[0,R(k)],  xQ  ~ N[xq,P(0)]  (3.11) 

whereas  its  model  contains  noise  covariances  modeling  errors,  namely, 

w,  and  v.  are  assumed  to  be 
k k 

wk~N[0,Qc(k)],vk~  N[0,Rc(k)],x0~  N[x0,Pc(0)],  (3.12) 

where  R(k),R  (k)  > 0;  Q (k)  * 0;  P(0)  * 0,  Pc(0)  ^ 0. 

The  use  of  the  model  (3.12)  results  in  an  MMSE  filter  given  by 
x(k+l | k)  * Ax(k| k) 

x(k| k)  = [I  - Kc(k)C]x(k|k-l)  + Kc(k)zk  (3.13) 

where 

x(k+l|k)  A Ec{xk+1|Zk}+ 
x(k|k)  & Ec{xk|Zk} 

Zk  ^ [Zj,  ^ * 0,1,. ..,k] 

K (k)  - P (k|k-l)CT[C  P (k|k-l)CT  + R (k)]"1  (3.14) 

c c c c 


^Subscript  c implies  that  the  expectation  is  taken  assuming  model  (3.12) 
to  be  true. 


-4.  ■ • - - - r - 
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and 


P (k+l|k)  * A Pc(k|k)AT  + BQc(k+l)BT 
Pc(k|k)  - [I  - Kc(k)C]Pc(k|k-l) 

- [I  - Kc(k)C]Pc(k|k-l)[l  - Kc(k)C]' 


+ K (k)R  (k)K‘(k) 
c c c 


Pc(0|0)  - Pc(0) 


(3.15) 


Now  the  computed  matrix  Pc (k|  k.)  is  not  the  estimation  error  covariance 
matrix,  since  the  filter  model  differs  from  the  real  model.  Neither  is  this 
filter  the  MMSE  filter  for  the  actual  system  (3.11).  The  actual  estimation 
errors  are 


x(k+l|k)  ft  *^+1  " x(k+l|k)  = Ax(k|k)  + 


x(k|k)  ft  x - x ( k | k ) = [I-K (k)C]x(k| k-1)  - K (k)v 


(3.16) 


with  covariances  defined  by 

P( k | k)  = E{x(k|k)xT(k|k)] 


P(k+l|k)  ft  E[x(k+l|k)xT(k+l|k)} 


(3.17) 


and  satisfying  the  relations 

P(k+l|k)  - AP(k|k)AT  + BQ(k+l)BT 

P(k|k)  - [r-Kc(k)C]P(k|k-l)[l-Kc(k)C]T  + Kc(k)R(k)K^(k)  (3.18) 

Lemma  3.1:  Let  z(k|k-l;j)  denote  the  prediction  of  based  on  the 

observation  sequence  assuming  that  results  from  model  (1.1)- 

(1.2)  with  Markov  chain  state  sequence  Ij(k)  in  effect,  where 
k k+1 

lj(k)  ■ (in]n-0,  j * 1,2,...,S  , and  i^  represents  state  at  time  n. 

Let  Iy(k)  represent  a state  sequence  in  which  the  states  correspond  to 
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driving  noise  covariance  QU  and  measurement  noise  covariance  RU,  for  all 

time.  Similarly,  let  I^(k)  represent  a state  sequence  in  which  the 

states  correspond  to  noise  covariances  QL  and  RL  for  all  time.  Then 

P,(k|k-1;L|L)  < P (kjk-1; j |L)  < P (k|k-l;j|j),  V j and  for  k s 0, 
z z z 

where 

Pz(k|k-1;L|L)  = E((zk-z(k|k-l;L))(zk-z(k|k-l;L))T|lL(k)]+ 

Pz(k|k-1;J|L)  - E{(zk-z(k|k-l;j))(zk-z(k|k-l;j))TllL(k)3 

and 

Pz(k|k-l;j|j)  £ E{(zk-z(k|k-l;j))(zk-2(k|k-l;j))T|l.(k)} 

Proof : The  desired  results  will  be  proved  using  models  (3.11)  and  (3.12). 
From  (3.15)  and  (3.18),  let 

T(k| k)  * P(k| k)  - P (k|k) 

T(k+1 | k)  - P(k+l|k)  - P (k+l|k). 

Then 

T(k+l|k)  * AT(k| k)AT  + B[Q(k+l)-Qc(k+l)]BT 
T ( k | k)  = [l-Kc(k)C]T(k|k-l)[l-Kc(k)C]T 

+ Kc(k)TR(k)  - Rc(k)]K^(k)  (3.19) 

Now  suppose  Q (k)  * Q£(k)  and  R(k)  s R,(k)  Vk.  Then  if  T(k|k-1)  * 0,  it  is 
evident  from  (3.19)  that  T(k|k)  * 0 and  T(k+k|k)  s 0.  Therefore,  by 
induction,  we  have:  If  P(0)  ^ Pc(0)>  Q(M)  s Q£( k)  and  R(k)  s Rc(k)  Vk, 
then  P(k| k)  * P (k|k)  and  P(k+l|k)  * Pc(k+l|k)  Vk.  Similarly,  if 
P(0)  2 P (0),  Q(k)  2 Qc(k)  and  R(k)  £ Rc(k)  Yk,  then  P(k|k)  * Pc(k|k) 
and  P(k+l|k)  s Pc(k+l|k)  Yk.  Further,  similar  results  hold  if  we  compare 

’kE{(zk-z(k|k-l;L))(zk-z(k|k-l;L))T|lL(k)}  denotes  expectation  of 
(zk-z(k|k-l;L))( — )T  assuming  Markov  chain  state  sequence  IL(k)  to  be  in 
effect  in  model  (1.1)-(1.2). 
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[ 

i 


1 

.2  1] 

1] 

actual  estimation  error  covariance  matrices  of  two  different  "mismatched" 
filters.  This  proves  the  following: 

Px(k|k-l;j|L)  * Px(k|k-l;j| j),  V j , k 

where 

Px(k|k-l;j |l)  = E{(xk-x^(k|k-l))(xk-x.(k!k-l))T|lL(k)}  etc. 

(Recall  condition  (H2)  in  Section  3.4.)  Now  since 

Pz (k| k-1; j | L)  - CP  (k|k-l;J |l)CT  + RL,  etc. 
we  have  the  following: 

Pz  (k|  k-1;  j | L)  * Pz(k|k-l;j|  j),  ?j,k  . 

Since  a Kalman  filter  matched  to  the  true  model  statistics  is  a MMSE 
estimator,  we  have 

Px(k| k-l;L| L)  * Px(k|k-1; j | L)  j = 1 ,2 , . . . ,Sk+1 ; k = 0,1,...  Q.E.D. 

Note : Proof  of  P (kJk-l;j|L)  s Px (k J k-1; j | j ) follows  the  proof  of 
Theorem  7.6  in  [28]. 

Lemma  3.2:  Let  condition  (H3)  in  Section  3.4  be  satisfied.  Then  there 
exist  positive  real  numbers  4^  and  6^  such  that 

0 < 6XI  * Px(k|k;j | j)  * 62I,  V k ^ N > 0 and  V j . 

Proof : Since,  by  condition  (H3),  system  model  (1. 1)-(1 .2)  is  completely 
controllable  and  completely  observable  for  the  driving  noise  w^  and  the 
measurement  noise  v^  having  covariances  QL  and  RL,  respectively,  for  all 
time,  it  follows  from  Lemma  7.2  in  [28]  (see  also  [25]  and  [14])  that 
there  exists  6^  > 0 and  integer  N > 0 such  that 

Px(k|k;L|L)  * fijl  > 0,  V k * N.  (3.20) 

Now  consider  a suboptimal  estimator  which  uses  a fixed  estimate  x^  for 
all  time  and  for  all  possible  Markov  chain  state  sequences;  that  is,  we 

!] 

— j 
? 1 
i ] 

* . J 
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replace  x^(k|k)  by  x^  for  all  k and  j.  Since  Kalman  filter  is  the  optimal 
MMSE  estimator,  we  have 

Px Ck ) k ; j J j > £ eRj^-Xq)  (xk-x0)TJlj  (k)}  Yj  and  k (3.21) 

Now  from  conditions  (HI)  and  (H2)  there  exists  a positive  number  M < ® 
such  that 

i!E{(xk-x0)(xk-x0)T|lj(k)}||  SM,  V j,  k (3.22) 

Therefore,  from  (3 .2 1) - (3  .22) , there  exists  6^  > 0 such  that 

Px(k|k;j|j)  * 62l,  V j,  k (3.23) 

Hence,  from  Lemma  3.1,  we  have 

0 < Sjl  * P (k | k ; j | j)  5 62I,  V k 2N  > 0 and  V j Q.E.D. 

Lemma  3.3:  Given  the  dynamical  system  (1.1)-(1.2),  the  observations  z^ 

and  z become  uncorrelated  as  | k-n | — for  all  k,n  ^ 0. 
n 

Proof:  It  is  sufficient  to  show  that  states  x,  and  x are  asymptotically 

x n 

uncorrelated.  From  (1.1),  it  follows  that 


E{xxT|l .(k)}  = A*V  (0) (AT)n  + E Ak_1"1BQ.(i)BT(AT)n‘i'1 
k n j x q j 


a ,T'  | k*i*l 

where  V^(0)  “ E{xqXq|I  (k)} , k s n,  j = 1,2,...,S 


E{x  xT|l  .(k)}  = Ak'n(AnV  (0)(AT)n)  + Z (An'1'1BQ.  (i)BT(AT)n':L'1) 


. T.n 


n- 1 


n-i-l„„  .„T..T,n-i-l, 


k n j 


i=0 

< ,k-n,,n,r  /n\/AT.n.  , .k-n-.n-i-l  _ „T/A T.n-i-1. 

SA  (A  v (0)  (A  ) ) + I A (A  -B-QU-B  (A  ) ) ,Vj 

X i-0 


{Q  j (i ) } k is  the  covariance  sequence  corresponding  to  [w. } in  the 


i-0 

state  sequence  I^(k). 


1 i-0 
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It  follows  that 

E{x  xT|l.(k)}  - 0 as  | k-n | - ® uniformly  in  I . (k)  since,  by 
k n j j 

condition  (Hi),  A has  all  its  eigenvalues  inside  the  unit  circle.  Hence 


T i 

E{x^xn)  -*  0 as  | k-n  | 


Q.E.D. 


Remark  3.2:  Since,  given  I . (k) , [z  } is  a Gaussian  time  series,  it  follows 

J k 

from  the  proof  of  Lemma  3.3  that  z^,  z^  become  asymptotically,  conditionally 
independent,  uniformly  in  j,  given  the  state  sequence  I (k) , 
j = 1,2 , . . . ,Sk+1 ; k > n. 

Lemma  3.4:  Let  the  dynamical  system  (1.1)-(1.2)  begin  at  initial  time 


kQ  = *m  (instead  of  kg  = 0).  Then,  for  k s > -00,  the  observation 
sequence  {z^}  is  stationary. 

Proof:  Let  Z = fz.  I?1  , and  let  its  joint  probability  density  func- 

n,K^  tC  K~K^ 

tion,  assuming  n to  be  the  transition  matrix,  be  given  by 


f7  (91  j tt)  * E f 

n,kx 


Z£Z  „ ll.(n,k1)®P(IJ<“-kl)l’T) 
j n,k1l  1' 


(3.24) 


n-k.,+1 


where  a = {or.}.  , * denotes  value  of  Z , ; I.(n,k,)  represents  a 

- l i=l  n,k^  j 1 

specific  state  sequence  starting  at  time  k1  and  ending  at 
n-kj+l 

n,  j = 1,2 , . . . ,S  ; and  f i . , .(•)  is  the  density  function  of 

n,k111j'‘n,Kl; 


Z , given  I.(n,k-).  We  need  to  show  that 

n,k^  j 1 


Jn,k. 


(a)n) 


"n+4 ,k^+2 


(a|n),  a k^;£  S 0;  V a. 


(3.25) 


The  right  side  of  (3.25)  may  be  expressed  as 


"n+£ ,k ^+1 


(a|tr)  = E f, 


1 ^k+llI1^*kl+l><i)P<IJ(n+l,kl+t>|TT) 

j n+x,kj+x  j 1 (3.26) 
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where  1^  (n+2  .k^+2)  is  a shifted  version  of  I^n.k^).  From  condition  (H4) 
in  Section  3.4,  it  follows  that 

n-k.,+1 

P(Ij(n+£,k1+£)|-)  = P(I  (n.k^ln),  j = 1,2, ...,S  ,2  20.  (3.27) 

Furthermore, 


fzn,k1ll3<"-kl)fe>  ' \lI3<kl*kl)<"1> 


x J}  *z  \z  . , I.(m,k  )(am+l-k)'G;m-k1,...,al) 
m=k^+l  m m-l,k^,  j 1 1’  ’ 

(3.28) 

where  I (m,k1 ) is  a subsequence  of  I.(n,k. ) and  » is  the  value  of  2 , 

J 1 j l i k^+i"l. 

From  Lemma  3.3  and  condition  (H4)  (and  Remark  3.1)  in  Section  3.4,  it 

follows  that 


,(«,)  = f 


2,  |I. (k.  ,k,)v  l'  z,  , < 1 1 , (k,  +2 , k.  +2 ) ' 1 


(or  ) V 2 2 0 


(3.29) 


k1‘  j v 1’  1' 

with  similar  results  for  other  terms  in  (3.28).  Consequently,  it  follows  that 


(a) 


(3.30) 


which  proves  that  (3.25)  holds.  Q.E.D, 

Lemma  3 • 5 : Let  condition  (H3)  in  Section  3.4  be  satisfied.  Then  the 

Kalman  filters  matched  to  state  sequences  I^(k)  (that  is,  Kalman  filters 

designed  assuming  state  sequences  I^(k)  to  be  true  in  model  (1 . 1) - (1 .2 ) ) , 
k+1 

j = 1,2,...,S  , are  uniformly  asymptotically  stable,  uniformly  in  k and 

in  j. 

Proof : The  Kalman  filter  matched  to  the  state  sequenc  I^(k)  is  given  by 


the  following  equations 


x (k|k)  = 1 (k,k-l)x  (k-l| k-1)  +K.(k)z 
J J J J K 

K.(k)  = P (k|k-l;j|j)CT[CP  (k|k-l;j|j)CT  + R (k)] 

J X X J 


-1 


(3.31) 

(3.32) 


Px(k|k;j|j)  = 7 (k,k-l)Px(k-l|k-l;j| j)*J(k,k-l)  + F (k) 


(3.33) 
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F.(k)  = [I-Kj(k)C]BOj(k)BT[I-K.(k)C]T  + Kj(k)Rj(k)K^(k)  (3.34) 

k+1 

where  j = 1,2,.  ..,SK  ; ^(k.k-l)  = [I-K.(k)C]A  and 

Y (M)  = Y (k,k-l)Yj(k-l,k-2)...¥j(j8+l,i). 

The  Kalman  filter  given  by  equations  (3 .31)- (3 .34)  is  uniformly 
asymptotically  stable  in  the  large  if  there  exist  real  scalar  functions 
Vj  (x^.  (k|  k)  ,k) , Vjj  (!!xj  (k | k)  I! ) , V2  j (l!xj  (kl k)  II) » and  Y3 j ( (k | k)  ) such 

that  for  some  finite  N ^ 0 

0 < V,  . ( i!x  (k|  k)  ti ) s V (x  (k|k),k)  s V,  , (I'x.  (k|k)  ),  x (k|k)  # 0 

ijj  JJ  ^ J J J (3.35) 

V (x  (k|k),k)  - V (x  (k-N|k-N),k-*0  « V3J (ll*j  (k|k)||)  < 0 

k * N,  x.(k|k)  t 0 (3.36) 

and,  in  addition, 

Yij(°)  * V2 j (°)  = 0.  lira  Y^P)  ’ “ » (3.37) 

(see  [28],  [18],  [30]).  It  has  been  shown  in  [18],  [28]  that  an  appro- 
priate Lyapunov  function  is  given  by 

V (x  (k | k) , k)  = x^(k|k)P  1 (k | k ; j | j)x  (k | k)  (3.38) 

From  Lemma  3.1  and  Lemma  3.2  we  can  find 


Y1(i:Xj(k|k)||)  ^ Y1j(iiXj(k|k)||) 

Y2 (I'Xj  (k | k)  !|)  s Y2  j ('!*j  (k | k)  ll) 

to  satisfy  (3.35),  (3.37)  for  all  j, 
the  solution  given  in  [18]  and  using 
such  that 


V j 

V j 

k+1 

j = 1 , 2 , . . . , S . Further,  following 
Lemma  3.1  and  Lemma  3.2,  we  can  find 


Y3j(ilx  (k|k)|!)  s Y3  ( ' Xj  (k  | k)  i|)  < 0 . 


^ I 


■UN 


■ ■ — U I 
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Then  Che  Kalman  filters  matched  to  I^(k)  are  uniformly  asymptotically  stable, 
uniformly  in  k and  in  j (see  Theorems  4.2  and  4.3  in  [68]).  That  is,  given 
e > 0 S 5 (s)  > 0 and  T(e,5  ) > 0 3 ||jL(k|k)||  < e for  all  k s l + T(e,5) 
if  ||5L(.i|jt||  < 5,  for  all  £ a 0 and  V j,  where  we  consider  the  system  given  by 


Xj(k|k)  = Y j (k| k-l)Xj (k-l| k-1) 


(3.39) 


q.e.d. 

Remark  3-3:  An  important  consequence  of  Lemma  3.5  is  that  (see  Theorem  3 
in  [30])  there  exist  positive  constants  c^,  c^  such  that 


l'ij(k,4)ii  s c1  exp[-c2  (k-2)]  V j;  k U 


(3.40) 


Further,  the  effect  of  initial  statistic  Pg  and  estimate  x(0|0)  is 


forgotten  as  more  and  more  data  are  processed.  Moreover,  this  happens 
uniformly  in  all  possible  state  sequences. 

Lemma  3 • 6 : (i)  Let  the  dynamical  system  (1.1)-(1.2)  begin  at  initial 

time  kg  = L.  Then 

,llm»  , A|9*-m,t,)  ■ £z. \z  , a-e-  £ 

i - -=°  k k-1 ,1  k k-1,-®  t 

where  f i (•(•,tt)  Is  the  density  function  of  z,  given 

Zk|Zk-l,i  * 


r «*k-l 

Z,  , 4 58  iz  } t assuming  the  transition  matrix  tt  to  be  true  with  or,  . 0 
iC *•  i f **  n na-c  k j** 

appropriately  defined. 

(ii)  Let  the  dynamical  system  (1.1)-(1.2)  begin  at  initial  time  kg  * ■*. 
Then  the  random  sequence  |2  ^Zk^Zk-l,TT^^k=0  becomes  asymptotically 

c i k*  1 

stationary,  where  * l znJ n=o  * 
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Proof : (i)  We  have 

fz,|Z,  (0fJ-k-l,2 ,TT)  “ zz  \z  i fk  2)(aJ-k  1 

k k-l,£  ’*  Ij(k,i)  k1  k-l,2rj'-K,z;  * k L»*  j 

x P(Ij(k,2)|ak_1^,rr)  (3.41) 

whera  Ij(k,2)  is  the  state  sequence  beginning  at  l and  ending  at  k, 

J ■ 1,2 s ; ki  *•  Further’  \|\.U.IJW)^U'IJW(I 

is  a Gaussian  density  function  with  mean  s[z  Ja.  , ,I.(k,2)},  and 
covariance  given  by  an  equation  similar  to  that  for  P (k |k ; j | j ) Fj 


■ rom 


Lemma  3.5,  we  have 
lim  f 


Vi.— Ij(k--)("kls!‘-i.— Ii<k,‘")> 


= f 


(3.42) 


uniformly  in  j,  for  all  bounded  infinite 


sequences  a Now  define 


From  (3.42)  given  «x  > 0 3 ^(6^  = such  that 

lVk-M’Yk’‘>(^IVl-i'IJ<k,'t,> 


- r 


Zkl2k-l,k-N1(€1)'Ij(k>k-NiCs1))CC<k'-k-l,k-N1(e1),Ij(k'k-Nl^i»>  I < «L 
for  i <k-N(€1),  uniformly  in  j,  j - 1,2, . . . |Sk_2+1.  Therefore 


))  X 


|4il“'lJ(k!k.»1)I'titlzk.l,k.Ri.lJ(k.k-»l)(“k|Sic-l,lt-Kl>Ij(i>k-»i 

.P(Ij(,c,k-N1)^>k.1>2’,")’P(Ij(k»k-N1)  bLk.1>m>rr)}]|  ^26^  Y 2,m  < k-N^s^  (3.43) 
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From  condition  (H5),  given  > 0 3 N2^ei’S2^  = Nj  such  that 
|p(Ij(k,k-N1(el))|ak_M,TT)  - P(I  (k,k-N1(e1))|ak.1>m,TT)|  * e^a.e.  p2 


(3.44) 


for  4,m  < an<3  for  all  j • (Note  that  there  are  finite 


N1(«1)+l 


) number  of  state  sequences  of  length  N(e^)+1.)  Therefore, 


from  (3.43),  (3.44)  and  Lemma  3.2,  we  have 


S2e!  +S 


N1(«1)+l 


x ^X  M for  4,m  < k-N,  N = maxtN^,^)* 


where 


fzklV1,k-N1«Ij(k«k^)(“k9Lk-1,k-Nl,Ij(k,k'Nl))l  SM  Vk’j,N  aenma  3'2> 

Noting  that  the  space  of  all  bounded,  infinite  sequences  carry  probability 


one,  it  follows  that  Lemma  3.6(i)  holds. 


Q.E.D. 


°° 

(ii)  From  Lemma  3.4,  the  observation  sequence  is  stationary. 

Extend  this  single-ended  process  into  a double-ended  stationary  process 
lz^i^__00  such  that  £ } ^_q  and  ^zk^k~0  kave  tke  same  distribution  (see 

<30 

Proposition  6.5  in  [10]).  To  avoid  cumbersome  notation,  denote  lzk}k=_a 

by  [zk3k=.oo-  define  Z = lzkik=£,  n ^ t . Now  for  any  finite  4, 

f | (*,n)  is  a measurable  function.  Since  f i (*,11) 

Zk  Zk-1 ,4  zk  Zk-1,4 


converges  pointwise  a.e.  to  f 


akrk-i,-» 


(•,tt)  as  4 -*  it  follows  that 


“ co  a>  f 

f |_  (,»TT)  is  also  measurable  mapping  from  ( U (R  ,/9  ))  - (R ,8) 

Zk|Zk-l,-»  s-1 

(see  [5,  Theorem  1.5.4]).  Then  from  Proposition  6.6  in  Breiman  [10], 


I z (zJzk-i,-®,rr)  is  statlonary»  k = 

k - 1 , -® 


Q.E.D. 


R is  the  real  line  and  8 is  the  corresponding  Borel  a-field.  R is  the 
space  of  infinite  sequences  of  real  numbers  and  3°  is  the  corresponding 
Borel  7-field. 
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Remark  3.4:  (i)  From  Lemma  3.1  and  Lemma  3.2  we  have 


0 ~ f2  |Z  (Zk'Zk-l,1T)  “ Mi  < "*V  k ^ °’ 
1c  1c-  1 


Therefore,  from  the  dominated  convergence  theorem  (Ash  [5]), 


lim  f | (z.  |z..,n)  iS  inCeSrable< 

k - ® zk  |4Tc-l 


(ii)  Since  Zn(  •)  is  a continuous  function,  it  readily  follows  that  the 

random  sequence  [in  fg  i (z^  lZk-l  ,TT^k~0  becomes  asymptotically 

k*  k-1 

stationary. 


Lemma  3.7:  Consider  the  double-ended  stationary  process  (z^}  » as  defined 

00 

in  the  oroof  of  Lemma  3.6.  Define  Z , = fz,  1,  . , n 2 1.  Then 

n,l  kJk=l 


In  f | (zjz.  , ,tt)  is  integrable,  k * 0,1,...  . 

k1  k-l,-»  K 


where  subscript  l on  z^(k|k-l;j)  and  P (k|k-l;j|j)  indicates  that  these 

quantities  have  been  computed  for  the  observation  Z . . Then 

ox 

^ f(zklZk.l,2*Ij(k'i>>  “ * f Zn2n  - \ Za I ?zi^k I k - 1 ; j |j) | 

- ^(zk-Sx(k|k-l;j))TP^(k|k-l;j|j)(zk-zjt(k|k-l;j)) 
s - |2n2TT  - I 2n32  - j •||*k-*i(k  |k-l;  j)||2  (3.47) 
where,  from  Lemma  3.2,  there  exists  (3  , ^ > 0 such  that 

3ir  ~ P*i<klk“1;Jb)  “ *21’  l~  k-H;  V k,j  (3.48) 

In  a manner  similar  to  (3.31)  we  have 

k-1 

*ji(k|k)  = ■?j(k,A)Xj^(i|i)  + 2 ^ (k,i+l)K  ^(i)z^  (3.49) 

i*  ij 


where 


;!  Y (k,2)  ;|  < c^expt  -c2(k-£)  ] , c2,c2  > 0 (see  Lemma  3.5);  f j, 
which  with  (3.49)  results  in 

*2(k|k-I;J)  - C i.2(k|k-l)  = CAxjX(k-l|k-l) 


k-1 


CATJ(k,2)*Jjl(i[X)+  2^CAYj(k,i+l)KjX(i)zi  . 


(3.50) 
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Therefore,  the  norm  of  the  prediction  error  satisfies 


zk-z^(k|k-l; j):'2  ^ zR  2+2'i  z^l*  z^(k|k-l;j)H+!;z^(k|k-l;j)  2 

s i|zk  2+2'i|  zk!|-|'cA!!-c1[exp(-c2(k-i))-  ■ 
k-1 


+ SI^exp(-c2  (k-i))  - ilKjX(i)||  * HzJLtl] 


+ j| CA|| 2 • c2exp[-2c2(k-x)]  . ||3c  (Z | jtl| 


+ C 


CA  2-c2-  z X £(iU)'  -!!k  (i)  :-!!zt 


k-1 


J* 


• exp[-c2 (k-i)]exp[ -c2 (k-i-1) ] 

+ Z Z i'cA-H2  * !Ik  . (i ) ||  * ||K  (m)||-!!z  ||‘jiz  <|.c? 

i=2  m =*  ^ i m 1 

. exp(-c0 (k-i-1) ) -exp(-c2 (k-m-1)) 

Noting  that  K,.(i)  ! < M_  < ®,  V l , j , i and  "caII  < M,  < ®,  we  have 


(3.51) 


VJ* 


k-1 


z.  -z 


k l 


(k | k-1 ; j ) |2  s z 2 + M,  £ <Iz.II’  z ||-exp(-c  (k-i)) 


6 

k-1  k-1 

+ M-j  £ £ ■ ' z . ! | • z -exp(-c.  (k-i-1)  )exp(-c_  (k-m-1) ) 

7 i=-oo  m=-«  i m 2 2 


g(zk).»)>  V *>J 


(3.52) 


where  and  are  appropriate  positive  constants. 

Now  we  claim  that  ]e  {g(Z,  )]|  < « where  the  subscript  t indicates  that 

t K,  “ 

the  expectation  is  taken  w.r.t.  distribution  corresponding  to  the  true 
stochastic  matrix  n . In  order  to  prove  this  claim,  note  that 
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!tC!!*k!H!*i!|}  * CecCI!*^2}]  - ^ y m. 


Hence  from  (3.52)  it  follows  that 

0 < Sc£s(Zk  _)}  ^ Mg  + M6-Mg*  Z exp(-c2(k-l)) 

* • oo 


k-1 


/ k-1 


\2 


^My.Mg 

Therefore,  from  (3.47),  (3.52)  and  (3.53),  one  obtains 


Z exp(-c,(k-i-l) ) < 

i— • “ / 


in  f(ak|Zk_1>i,Ij(k,£))  * h(Zk>_-)  Y l, j 


where 


and 


h(Zk^_aJ)“*2  2n  2rr  - 2 2n  32  2 ^1  ®(zkf-«) 


lst-h(Zk,-«)}!  < * • 


(3.53) 

(3.54) 

(3.55) 

(3.56) 


The  use  of  (3.54)  and  (3.55)  yields 


In  f(2k!zk.1>2»TT)  * h(Zk,-« } 


Further,  from  (3.45),  (3.46)  and  (3.48),  we  have 


(3.57) 


, I*  . m . . , in  P,  ■ constant,  Y l ..... 

in  f (zk!  zk_i  2 ^ ” 2 2,1  ”2  ^ (3-58) 

Then  from  Lemma  3.6(1),  Equations  (3 .56 ) -(3 .58)  and  Fatou's  Lemma  (Ash  [5]), 


the  desired  result  follows. 


Q.E.D. 


-f  — n 

Remark  3.5:  Consider  the  single-ended  stationary  process 

resulting  from  the  dynamical  systam  (1.1)-(1.2).  Let  Zk  - Then 

from  the  proof  of  Lemma  3.7,  we  have 


i 
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|EtUn  f(zk|zk_1,^)}|  * M1(J  < »,k  « 0, 1,2 j 


(3.59) 


EtCUn  f(zk|zk.1,TT)|]  < M1]_  < «,k  - 0,1,2,... 


(3.60) 


If  we  now  define 

yk  * 2n  f (zklZk.1'TT)  " EtUn  f (2klZk-l,TT)^ 

Chen  [yk}k=Q  is  an  asymptotically  stationary  (Lemma  3.6),  zero-mean 
random  sequence. 

Lemma  3 ■ 8 : p{n  ^ly^"1^"*''  * •+5rIJ  > "*  ^or  any  e > 0 , as  n - «. 


Proof : From  Feller  Vol.  II[20,  p.  240],  it  is  sufficient  to  show  that 


(3.61) 


Et{y2yk}  -0  as  |k-2 | 


Let  for  example  k > ■£,  then 


EJyiyk^  = S Et*-y£yk^j(k^  X P(Ij(k)lTTt)’  j = 1»2>-*-»sk+1  • 

X . (k) 

From  Lemma  3.3,  Remark  3.2,  Lemma  3.6(i)  and  Lemma  3.7,  it  follows  that 
lim  Et{y^y  |l. (k) } - 0 uniformly  in  j . 


Therefore, 


EtCyiyk}  - 0 as  | k-l | - ®. 


Q.E.D. 


Now  we  are  ready  to  prove  Theorem  3.1. 

Proof  of  Theorem  3.1:  According  to  Tse  and  Anton  [63]  , the  following 
6 conditions^"  are  sufficient  for  the  weak  consistency  of  the  ML  estimate 
to  hold: 


Condition  (AC6)  is  given  in  Tse  [64].  It  was  overlooked  in  [63] 


(AC1)  f (Z  Itt)  is  measurable  in  Z w.r.t.  measure  j.  (dZ  ) ^ f(Z  In  )dZ 
n n n n n1  t n 


where  n is  the  true  transition  matrix,  and  Z = fz  }n  „ . 
t n ^ i i-0 

(AC 2)  For  each  n € O’  and  all  n » 1,2,...,  Ec(in  fCz^jz^  ^,n)}  < », 

where  O'  = {^.....n^.  Also,  EtC  S f (zJ  Zn-l,TTt^  < °B»  n*1»2>' 

(AC3)  For  each  n 6 O', 

-1  n 

var{n  Z In  f (ZjJZj.  ,tt)}  - 0 as  n - 
k*l 


This  can  be  replaced  by  assumption  (AC31). 

(AC3')  The  observations  {z^}  are  asymptotically  uncorrelated. 


(AC4)  Define  the  set  #n(n)  ■ {Z^ifCZ^jn)  » 0] . Then  for  all  n,n'  € S’, 


we  have 


8 (n)  = B (n’>  V n = 1,2,...  . 

n n 


(AC5)  For  all  tt^  € fi'  , n^  # n£,  there  exists  an  infinite  set  G c I , 


such  that  the  inequality 

f(zJ2n-l’V  **  f(zniZn-l*TTt) 

holds  with  nonzero  probability  w.r.t.  n^  uniformly  in 

n € G.  (Note  that  I+  ■ set  of  nonnegative  integers.) 

(AC  6)  Define  uq  - 2n  * - 1*  f (2n!  Zn-1,TTt)  > "q  * V 

and  n ,n  € Q'.  Then  E „[u  } < 0 uniformly  in  n. 
q c c n 

Now,  conditions  (AC1)  and  (AC4)  are  obviously  satisfied  since  we  are 


dealing  with  Gaussian  mixtures.  Leosna  3.7  and  Remark  3.5  satisfy 
condition  (AC2).  Lemma  3.8  replaces  condition  (AC3)  since  (AC3)  is  used 
in  Tse  and  Anton  [63]  as  a sufficient  condition  for  the  weak  law  of  large 
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numbers  to  hold  for  the  sequence  {in  f(z^|2k_1,rr)}  Condition  (H6)  and 

Lemma  3.6  satisfy  (AC5)  (see  also  Remark  3.6).  It  remains  to  show  that 

condition  (AC6)  holds.  From  Lemma  3.6,  ur  -*  u^  pointwise  a.e.,  where 

u^  = lim  (in  f (Znlzn-i'nt)  " f (zJZn-l,TTt^  (3-62) 

n - “ 


and 


Et(uJ  < » . 


From  Equation  (46)  in  [63],  we  have 

Et{un)  < inEt{exp(un)} , n = 1,2,...,“.  (3.63) 

Also,  from  Equation  (47)  in  [63],  one  obtains 

Et{exp(un)}  = 1,  n = 1,2,...  . (3.64) 

Using  Fatou's  lemma  [5],  we  obtain 

lim  in  E [exp(u  )}  S in  Et[exp(uM)}  (3.65) 

Following  the  procedure  given  in  the  proof  of  Lemma  3.7,  and  using 

Fatou's  lemma  (or  the  dominated  convergence  theorem)  [5],  we  have 

lim  Et{un]  = Et{u35}  . (3.65) 

n ->  * 

Hence,  from  (3 .63) -(3 .65) , we  have 

E^uJ  < 0 . (3.67) 


Therefore,  condition  (AC6)  holds. 

Thus  we  have  proved  Theorem  3.1.  Q.E.D. 

Using  Theorem  3.1,  we  can  easily  establish  the  following: 


Note  that  Lemma  3.3  satisfies  condition  (AC3 ' ) . However,  it  is  not  clear 
how  (AC31)  can  replace  (AC3). 


A. 


V 
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Theorem  3.2:  Suppose  that  the  hypotheses  of  Theorem  3.1  hold.  Suppose  that 
every  n in  the  finite  set  » which  includes  the  correct  value 

it  , is  assigned  nonzero  a priori  probability.  Then 

lim  P (tt  | Z ) = 1 for  tt  = tt 
_ _ v q1  n q t 

n - 00 

= 0 for  tt  # tt 

q t 

in  probability 

Proof:  It  follows  immediately  from  Theorem  1 in  Yamada  [71]. 

Remark  3-6:  Condition  (H6)  is  needed  to  satisfy  an  identifiability  condi- 
tion [63]  (see  condition  (AC5)).  It  is  a sufficient  condition  for  two 
transition  matrices  tt^  and  rr^  to  be  "resolvable"  [63]  through  maximum 

likelihood  identification.  From  Lemma  3.6  f(z  Z , ,rr  ) becomes 

' n1  n-1  q 

stationary  for  large  n and  all  rr^'s.  So  either  condition  (AC5)  holds  or 

lim  f(z  jz  , ,n  ) = lim  f(z  |z  , ,tt  ) a.e.  . It  appears  that  if  tt 
_ n'  n-1  q ' n n-1’  t rr  q 

n-oo  n - 31  ^ 

and  tt^  are  such  that  tt^  ^ then  (H6)  is  satisfied,  because  then 

lim  tt11  f lim  tt”,  consequently,  the  observation  processes  {z  1 corresponding 
n ^ n 

to  these  tt's  become,  in  the  limit,  two  stationary  processes  with  different 
statistics  (see  Lemma  3.4).  However,  a proof  of  this  is  not  yet  available. 
Under  certain  situations  lack  of  resolvability  is  not  a problem  since 
identification  per  se  is  not  the  objective.  For  example,  consider  the 
problem  where  switchings  occur  only  in  the  driving  noise  statistics  of 
model  (1.1)-(1.2).  Then  ML  estimate  converges  to  a tt^  such  that 
x(k|k;TT^)  = x(k|k;ft)  for  large  k.  This  is  acceptable  if  state  estima- 
tion is  the  sole  objective. 

Rema rk  3.7:  Condition  (H5)  is  needed  in  Lemma  3.6  to  prove  the  pointwise 

convergence  of  f(z  Z , ) a.e.,  as  i •*  -®,  for  all  admissible  tt  's, 

® v n n-1, 4 q q 


A 
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where  system  model  (1. 1)— (1-2)  is  assumed  to  have  started  at  initial 

time  k = l and  Z „ = { z,  , l ^ k s n } . So  far  we  have  been  unable  to 
o n ,1  k 

find  a proof  for  this.  However,  from  heuristic  considerations,  this 

seems  to  be  true.  Note  that  system  model  (1.1)  is  assumed  to  be  stable. 

Lemma  3.2  ensures  that  f(z  Z . . ,tt  ) is  well  defined  for  all 

n n-1,4  q 

Z . , i -®.  Since  the  system  is  stable  and  the  Markov  chain  governing 

n y 

switchings  in  the  noise  statistics  is  ergodic,  it  follows  that  the  initial 

conditions  (initial  system  state  value  x^,  initial  noise  values  w^  and  v^ , 

and  initial  Markov  chain  state  i^)  have  no  effect  on  the  asymptotic 

behavior  of  the  system.  So  f(zn|zn_^  tt^)  should  converge  pointwise 

to  f(z  Z , n ) as  l -*  for  all  bounded,  infinite  sequences 

n n-1,-00  q 

{ ,z  .,2n, ,z  },  n > -00.  For  the  same  reasons,  (H5)  must  hold.  From 

■ i u n 

the  Martingale  theory  [5],  it  follows  that 

lim  P(I . (n,k)  | n ,Z  ) = P(I  (n,k)|Ti  Z a.e.  P^  . 

kQ-  » J q n>  o J q ’ q 

It  then  follows  that  there  exists  a set  of  bounded,  infinite  sequences 

{ ,z  .,z  ,--,z  1 for  which 

■ l u n 


lim  P(I , (n,k)  j tt  ,Z  ) = P(I  (n,k)|n  Z _ 


Ko 


q n,k 


q’  n,-»)  . 


However,  we  have  been  unable  to  show  (due  to  the  complicated  expression 

for  P(I.(n,k)|TT  ,z  , ))  that  this  holds  for  every  bounded,  infinite 
J q n»kQ 

observation  sequence.  Nevertheless,  it  does  serve  to  indicate  that  under 
intuitively  reasonable  condition  convergence  in  Theorem  3.1  is  attained. 
Remark  3.8:  Theorems  3.1  and  3.2  prove  the  convergence  of  the  optimal 
algorithm  given  in  Section  3.2.  But  in  practice  we  use  approximations 
given  in  Section  3.3.  From  properties  Pi)  (see  Eqn.  (B8))  and  P3)  in 
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Appendix  B,  it  follows  that  f (Z  | tt  ) can  be  approximated  arbitrarily 

k q 

closely  by  taking  sufficiently  large  N,  using  the  random  sampling  approach. 
So,  in  a sense,  convergence  of  the  random  sampling  algorithm  follows. 
Remark  3.9:  Results  can  be  extended  in  a straightforward  fashion  to  the 
case  of  non-zero  mean  noise;  the  details  are  omitted. 


3.5 . Convergence  in  Performance 

Now  we  present  some  results  on  the  convergence  of  the  performance 
of  the  optimal  estimator  to  that  of  an  estimator  operating  with  complete 
knowledge  of  the  true  transition  matrix.  We  make  use  of  some  results  of 
Hi lborn  and  Lainiotis  [24]  after  some  modifications.  First  we  need  a 
preliminary  result. 

Lemma  3.9:  Suppose  that  conditions  (H1)-(H4)  are  satisfied.  Given  a 
positive  definite  symmetric  matrix  F,  there  exists  a bound  < 05  such 
that 

E[  [x(k|k;tTq)F  xT(k|  k;nr)]2]  < M1 
for  all  k,  where  x(k|k;i"rq)  = e[x^  | } and  TTqjTTr 

Proof : 

Equation  (3.6)  implies  that 


xCkf  k;TT^) : Sj  XjCk  k)||  X P(I  j (k)  |zk,rrq) 

which  with  Equation  (3.31),  and  after  noting  that  K j ( i ) i | 
yie Ids 

k-1 

x (k|k)j|  ^ I M_  X z.  X c ^ exp[-c2(k-i)] 

J ^ = -30  ^ 

A 

= g(Z^  independent  of  j . 


(3.68) 

s M3  < »,  Vj,i, 


(3.69) 


60 


Therefore,  we  have 

!lx(k  kjTi  )||  * g(Zk_1)  (3.70) 

Now,  the  Cauchy-Schwartz  inequality  yields 

[x(k|k;rrq)  F (k|  k;^)]2  * ||  F||2  X g4(Zk_1)  (3.71) 

It  can  be  shown  that  (e.g.  see  the  proof  of  Lemma  3.7) 

E{g4(Zk_1)}  < - 

and  hence,  the  desired  result  follows  from  (3.71). 

Q.E.D. 

The  following  theorem  implies  the  convergence  of  [x(k| k) -x(k |k;nt) ] 
to  zero  in  quadratic  mean. 

Theorem  3.3:  Suppose  that  the  hypotheses  of  Theorem  3.2  and  Lemma  3.9 
are  satisfied.  Then 

lim  e[  [x(k| k)  - x(k|k;ir  )]TF[x(klk)  - x(k|k;n  )]}  = 0 
k - » 

Proof:  It  follows  from  a modification  in  the  proof  of  Theorem  1 in  [24] . 

To  this  end,  we  note  that,  by  an  extension  of  the  dominated  convergence 
theorem  (problem  5,  p.  96  in  [5]),  a — 0 in  probability  implies  that 

E{ak)  - 0 as  k x,  where  ak>  k = 1,2, is  a sequence  of  positive  random 

variables  such  that  0 s ak  s 1.  We  use  this  fact  to  appropriately  modify 
the  hypotheses  of  Lemma  1 and  Theorem  1 in  [24] . 

Q.E.D. 

Now  we  show  that,  under  the  same  conditions  as  for  Theorem  3.3,  the 
optimal  quadratic  risk  Jk  converges  to  the  average  optimal  risk  J**  for 
the  known  transition  matrix  estimator  where 
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Jk  = Eftx^  - x(k|k)]T  F[xk  - x(k| k) ] } 

and 

Jk  = - x(k|k;TTt)]  F[xk  - x(k|k;rrt)]}  . 

Theorem  3.4:  Suppose  that  the  hypotheses  of  Theorem  3.2  and  Lemma  3.9  are 
satisfied.  Then 

lim  [J*  - J**]  = 0 

k - GO  * K 

Proof:  It  follows  from  a modification  in  the  proof  of  Theorem  2 in  [24] . 
The  modification  required  is  the  same  as  given  in  the  proof  of  Theorem  3.3. 

Q.E.D. 

We  remark  that  the  convergence  theorems  in  this  section  are  general 
and  apply  to  any  (stable)  conditional  mean  estimation  system.  For  example, 
suppose  that  Case  1 of  model  (1.1)-(1.2)  is  modified  to  include  a control 
input  (possibly  closed  loop)  in  state  equation  (1.1).  Then  if  Theorem  3.2 
can  be  shown  to  hold,  results  of  this  section  will  follow  easily. 

3.6.  Example 

In  this  section,  a simple  example  is  presented  to  illustrate  the 
convergence  of  the  a posteriori  probability  of  tt  given  the  observations. 

We  consider  a scalar  linear  system  described  by  the  following 
equations : 

Vh  ■ °-9  *k  +“k 

zk  ' \ + vk  ■ k"°>1-— 

x - N(l,2),  v.  ~ N(0 , 1)  v k 

O X 


A 


Ml— K—  • ' 
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w^  is  zero-mean,  conditionally  white,  Gaussian  noise  process  and  its 
covariance  switches  between  = 0.5  and  Q ^ » 10.  The  transition 

matrix  tt  is  unknown.  It  is  assumed  that  n € {tt^ .tt^  .tt^}  where  represents 
the  correct  value  of  tt  and 


I—  -I 

1 0.8  0.2 

0.995  0.005 

0.6  0.4 

’ nl  = 

, tt2  = 

0.2  0.8 

0.005  0.995 

0.4  0.6 

The  Markov  chain  states  belong  to  { 1 , 2 } . We  take  the  initial  probabilities 
on  the  states  to  be  equal  to  the  steady  state  probabilities,  which  in  this 
case  are 

P(i  = 1 1 tt)  = P(i  = 2 | tt)  = 0.5 
o o 

for  all  the  three  admissible  it's. 

The  system  was  simulated  using  a random  number  generator  and  the 
a posteriori  probabilities  for  the  various  candidate  values  of  tt  were 
computed  for  a single  sample  sequence  of  observations.  Figures  3.1  and 
3.2  illustrate  the  convergence  of  the  a posteriori  probabilities.  In 
Fig.  3.1  results  are  given  for  the  random  sampling  approach  where  the  number 
of  sequences  sampled  were  40.  In  Fig.  3.2  results  are  given  for  the  pseudo- 
Bayes  approximation.  It  is  seen  from  these  figures  that  the  correct  value 
of  the  transition  matrix  was  identified  in  less  than  20  stages  and  convergence 
was  achieved  in  less  than  180  stages. 


3.7.  Discussion 

In  this  chapter  we  presented  an  adaptive  Bayesian  approach  to  optimal 
state  estimation  in  linear  discrete-time  systems  with  unknown  Markovian 
noise  statistics.  First,  expressions  for  the  optimal  estimator  were 
derived  and  expressed  in  a recursive  form.  Then  several  suboptimal 
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Figure  3.1.  The  a posteriori  probabilities  P(rr  |z^)  vs.  time  for  the 
random  sampling  approach.  ^ 
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algorithms  were  discussed  to  alleviate  the  large  computation  and  storage 
requirements  of  the  optimal  solution.  Finally,  asymptotic  behavior  of 
the  solution  was  investigated.  We  examined  the  Bayes  optimal  estimation 
scheme,  whereas,  in  practice,  only  suboptimal  algorithms  can  be  used. 
Assuming  that  the  suboptimal  algorithm  is  a "good"  approximation  to  the 
optimal  scheme,  our  convergence  results  should  then  carry  over  to  the 
suboptimal  schemes  in  some  sense. 

An  example  was  simulated  using  random  sampling  approach  and  the 
pseudo-Bayes  approximation.  It  demonstrated  the  convergence  of  the 
conditional  probability  of  the  true  transition  matrix,  given  the  past 
observations,  to  one.  It  should  be  noted,  however,  that  since,  in 
practice,  only  suboptimal  algorithms  can  be  employed,  therefore,  despite 
Theorems  3.1  and  3.2,  we  may  not  be  able  to  resolve  two  arbitrarily 
"close"  transition  matrices.  This  resolvability  will,  in  general, 
depend  upon  the  "goodness"  of  the  approximation  to  the  optimal  scheme  and 
can,  perhaps,  be  determined  only  through  simulations.  The  significance 
of  Theorem  3.2  lies  in  the  fact  that  it  indicates  insensitivity  of  the 
asymptotic  performance  to  the  priors  P(tt  ).  Moreover,  the  question  of 
resolvability  of  the  unknown  parameter  set  is  of  practical  interest  in 
that  it  facilitates  the  selection  of  an  appropriate  model  structure 
(unknown  parameter  set).  If  our  primary  interest  lies  in  the  asymptotic 
performance  and  if  two  unresolvable  transition  matrices  are  included  in 
the  set  of  unknown  transition  matrices,  then  by  discarding  one  of  the  un- 
resolvable matrices  a saving  in  the  computational  requirements  can  be 
obtained.  The  sufficient  conditions  given  in  this  chapter  aim  to  answer 
this  question. 
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In  establishing  the  consistency  results  given  in  this  chapter,  we 
used  the  sufficient  conditions  given  in  Tse  and  Aton  [63]  . These  are  a 
generalization  of  the  conditions  given  by  Wald  [66]  for  independent 
equidistributed  observation  sequence.  In  the  literature  other  sets  of 
sufficient  conditions  are  also  available;  one  of  them  is  a generalization 
of  the  conditions  given  in  Cramer  [16].  These  conditions  have  two 
drawbacks:  one  is  some  differentiability  and  uniformity  conditions  on 
the  observation  density  which  are  quite  difficult  to  verify,  and  the 
other  drawback  is  that  these  conditions  give  only  "local"  results  as 
opposed  to  the  "global"  results  given  by  the  sufficient  conditions  of 
Tse  and  Anton  [63] . 

We  considered  only  the  case  where  the  set  of  unknown  transition 
matrices  is  finite  (and  discrete).  When  the  set  of  unknown  matrices  is 
compact  (and  continuous),  one  way  to  extend  the  given  method  to  this 
case  is  to  approximate  the  continuous  parameter  space  with  a finite  set 
of  quantized  points,  and  then  solve  the  resulting  finite  parameter  space 
approximation  to  the  problem  by  the  method  presented  here.  A major 
difficulty  with  such  a quantization  technique  is  that  the  quantized 
points  used  in  the  approximation  increase  exponentially  with  the 
dimension  of  the  parameter  space.  One  possible  alternative  to  this 
difficulty  may  be  the  combined  detection-estimation  scheme,  with 
incremental  mean-square  error  criterion  of  Sebald  and  Haddad  [55]. 


— 


67 


CHAPTER  4 


ADAPTIVE  ESTIMATION  UNDER  UNCERTAIN  OBSERVATIONS 


I 


4.1.  Introduction 

In  Chapter  3 we  considered  the  problem  of  adaptive  state  estimation 
in  switching  environments  under  the  assumption  that  the  transition 
probability  matrix  was  unknown.  In  this  chapter  we  are  concerned  with 
state  estimation  for  linear  discrete-time  systems  with  uncertain  observa- 
tions, where  the  uncertainties  are  governed  by  a Markov  chain  with  unknown 
transition  probability  matrix.  Specifically,  we  consider  Case  2 of 
Model  (1.1)-(1.2).  We  have  chosen  to  treat  Case  2 separately  from 
Case  1 because  (i)  it  leads  to  considerable  simplification  in  notation 
and  (ii)  more  importantly,  the  sufficient  conditions  required  to  prove 
convergence  results  for  Case  2 in  Section  4.3  are  different  from  the 
conditions  needed  for  Case  1 (see  Section  3.4). 

An  important  aspect  of  the  problem  of  state  estimation  for  linear 
systems  is  the  case  where  the  observations  do  not  contain  the  signal  with 
probability  one.  As  pointed  out  in  Section  1.1,  such  a situation  may 
occur  due  to  an  intermittent  failure  in  the  observation  mechanism.  It 
may  also  occur  in  a tracking  situation  when  sensor  returns  may  originate 
from  something  other  than  a single  object  being  tracked,  such  as  other 
objects  being  tracked,  new  objects  not  yet  being  tracked,  false  alarms, 
clutter,  and  radio  frequency  interference  [56].  Sawaragi  et.  al. 

[51,52]  were  the  first  to  address  the  problem  of  state  estimation  for 
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linear  discrete -time  systems  with  stationary  interrupted  observation 
mechanism  where  the  statistics  of  the  interruption  process  are  unknown 
but  fixed.  In  both  these  papers  a Bayesian  viewpoint  was  adopted  and 
the  objective  was  to  present  a feasible  adaptive  algorithm  that 
sequentially  produced  an  approximate  MMSE  estimator.  The  asymptotic 
convergence  of  the  a posteriori  probabilities  of  the  Markov  transition 
probabilities,  given  the  past  observations,  was  demonstrated  through 
simulations;  no  theoretical  justification  was  provided 

Our  main  concern  in  this  chapter  is  to  provide  theoretical 
justification  for  the  convergence  of  the  a posteriori  probabilities 
of  the  transition  probability  matrices.  The  objective  is  to  investigate 
the  conditions  under  which  the  constrained  maximum  likelihood  estimate 
of  the  transition  matrix  converges  in  probability  to  the  correct  value 
of  the  transition  matrix.  Then  other  convergence  results  such  as 
convergence  of  the  a posteriori  probability  of  the  transition  matrix 
given  the  past  observations  and  convergence  of  the  performance  of  the 
adaptive  scheme,  follow  easily  as  in  Sections  3.4  and  3.5. 


In  Section  4.2  we  discuss  state  estimators  for  Case  2 of  Model  (1.1)- 
(1.2);  both  MMSE  estimator,  optimal  in  the  Bayesian  sense,  and  suboptimal 
approximations  to  it  are  considered.  The  convergence  results  pertaining 
to  the  optimal  MMSE  estimator  are  discussed  in  Section  4.3. 

4.2.  State  Estimators 

In  this  section  state  estimators  for  Case  2 of  Model  (1.1)-(1.2)  are 
discussed.  We  treat  both,  Bayes  optimal  MMSE  estimator  and  suboptimal 
approximations  to  it. 


? 
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Optimal  Estimator:  The  derivation  of  the  optimal  estimator  parallels 
that  given  in  Section  3.2  for  Case  1.  Therefore,  we  shall  omit  most  of 
the  details  and  simply  indicate  the  changes  to  be  made  in  the  equations 
given  in  Section  3.2,  appropriate  to  Case  2. 

Note  that  since  only  {v^}  assumed  to  be  modeled  by  a Markov 
chain,  we  only  have  a two-state  Markov  chain,  with  unknown  transition 
probability  matrix  tt  € , . • • .rr^}  . Let  the  a priori  probability 

of  the  qC^  transition  matrix  be  denoted  by  P(tt^)  as  in  Section  3.2.  The 
problem,  as  in  Section  3.2,  is  to  obtain  the  conditional  mean  of  state 
x^  given  the  past  observations  Z^.  Define  a Markov  chain  state  sequence 
I(k)  as 


i(k)  = {v0»v1»---»vk} 


(4.1) 


which  replaces  Equation  (2.2)  in  Section  2.2.  Let  I^(k)  denote  a specific 

k+1 

state  sequence  from  the  space  of  the  sequences  I(k)  which  contains  2 
elements.  Furthermore,  I (k),  I (k-1),  m and  n used  in  Equation  (3.7) 

J ** 

are  now  defined  by  the  relations 


!j(k)  = [l£(k-l),  vk  “ ni},  m = 0 or  1 


(4.2) 


I£(k"1)  * £V0’V1’  ‘ ‘ * ,vk-2,Yk-l  = n^’  n = 0 or  1 


(4.3) 
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All  other  equations  in  Section  3.2  remain  the  same  except  that  the  Kalman 
filter  equations  needed  to  compute  f (z^| Z^_ ^ , I (k) ) and  x^(k|k)  are  now 
appropriately  modified  to  suit  Case  2 of  Model  (1.1)-(1.2).  These 
modifications  will  be  discussed  later  in  Section  4.3  (see  Equations 
(4.4)-(4. 9) ) . 

As  pointed  out  in  Section  2.2  the  optimal  estimator  requires 
exponentially  increasing  storage  and  computation  capabilities.  To 
circumvent  this  difficulty,  we  now  discuss  several  suboptimal 
approximations . 

Suboptimal  Estimators:  All  the  three  approximations  discussed  in  Section 
3.3  are  applicable  here,  with  minor  changes  caused  by  switchings  in  the 
observation  matrix  (see  Equation  (1.2)).  The  changes  are  straight- 
forward and  therefore,  will  not  be  discussed  any  further. 

In  the  following  section  we  discuss  some  convergence  results 
pertaining  to  the  optimal  MMSE  estimator. 

4.3.  Convergence 

In  this  section  the  asymptotic  behavior  of  the  Bayes  optimal  MMSE 
estimator  is  investigated.  Attention  is  focused  on  investigation  of  the 
sufficient  conditions  under  which  the  CML  (constrained  maximum  likelihood) 
estimate  of  the  transition  probability  matrix  converges  in  probability 
to  the  correct  value  of  the  transition  matrix.  Then  other  convergence 
results  such  as  convergence  of  the  a posteriori  probability  of  the  transi- 
tion matrix  given  the  past  observations  and  convergence  of  the  performance 
of  the  adaptive  scheme,  follow  easily  as  in  Sections  3.4  and  3.5. 


71 


It  should  be  noted  that  in  the  sequel,  as  in  Chapter  3,  the 
distinction  between  a random  vector  and  its  particular  realization 
(value)  will  not  always  be  made  in  the  notation,  rather  it  will  be  left 


to  context 


The  Kalman  filter  matched  to  a specific  state  sequence  I^(k)  is 
given  by  the  following  equations: 


x.(k|k)  = y j (k,k-l)Xj (k-1 | k-1)  + Kj(k)zR  (4.4) 

K (k)  = Px(k|k-l;j| j)yk  CT[CPx(k|k-l;j| j)CTV^  +R]"1  (4.5) 

J j j 

P ( k | k; j | j ) = Y.(k,k-l)P  (k-l|k-l;j| j)Y^(k,k-l)  + F (k)  (4.6) 

X J A J J 


F (k)  = [I-K  (k)Cy  ]BQBT[I-K  (k)CV.  ]T  + K.(k)RK*(k) 
J J J J J 


where 


Yj(k,k-1)  = (I-K.(k)Cvk  )A 


Yj(M)  = Y j (k,  k-l)Y  j (k-1,  k-2 ) Y j (/  + 1,1) 

Px(k|  k-1;  j | j)  = E[[xk  - * (k|k-l)][xk  - x.(k|k-l)]T|l.(k)} 


(4.7) 


(4.8) 


(4.9) 


y.  is  the  "last"  element  in  the  sequence  I (k). 

J j 

We  impose  the  following  conditions  on  Case  2 of  Model  (1.1)-(1.2): 
(CH  1)  The  state  transition  matrix  A in  the  system  Model  (1.1)-(1.2) 
has  all  its  eigenvalues  inside  the  unit  circle. 
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(CH  2)  Assume  that,  for  some  5 > 0, 


max|\([l-K  (k)Cy  ]A)|*  <1-6,  V k and  j,  j = 1,2, . . . ,2k+1, 
X J J 


where  K.  (k)  and  y,  are  as  defined  in  the  Kalman  filter  Equations 
^ j 

(4.4)-(4.9).  Furthermore,  assume  that  the  pair  (A,C)  in 


(1.1)-(1.2)  is  observable. 

(CH  3)  For  all  tt  € 0,  we  have  pQC),  p^  < 1,  where  Q = • ,ttmJ  . 

A consequence  of  this  is  that  the  2-state  Markov  chain  governed 
by  any  n(Q  is  ergodic  [19]  (see  also  Remark  3.1  in  Chapter  3). 


(CH  4)  Suppose  that  the  system  (1.1)-(1.2)  begins  at  initial  time  k = k^. 


Let  Ij(n,k)  represent  a specific  sequence  from  the  set  of  Markov 


chain  state  sequences  I(n,k)  = Cvk»Yk+1» • • • »Vn) » n 2:  k, 


j = 1,2,.  ..,2n  k+^.  Let  Z , = [z,,k  < k<  n} . Assume  the 

n , Kq  k u 


following : 

k “",..P(Ij<"-k)lnq’Zn,k0>  ’ P<y"'k)|VZn,-.>  a-e-  Pnt 


where  n,k  > -«  are  fixed,  and  P is  a probability  measure  induced 


TT 


on  the  range  space  of  the  random  sequence  [ ,z ,z  } 

k n 


parameterized  by  the  correct  value  of  the  transition  matrix  it. 
The  following  theorem  states  the  weak  consistency  of  the  CML 
estimate  of  tt. 


Note  that  here  we  are  using  the  same  notation  for  a random  variable  and 
its  values. 


*X(B)  = an  eigenvalue  of  matrix  B. 
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Theorem  4.1:  Suppose  that  the  conditions  (CH  1)-(CH  4)  hold.  Then  the 

CML  estimate  rr(Z  ) of  tt  converges  in  probability  to  rr.  as  n -•  ®. 
n t 

The  proof  for  Theorem  4.1  consists  of  satisfying  the  sufficient 
conditions  given  in  Tse  and  Anton  [63],  as  in  the  case  of  Theorem  3.1  in 
Chapter  3.  Several  preliminary  results  are  needed  first,  some  of  which 
parallel  those  given  in  Chapter  3. 

Lemma  4.1:  Suppose  that  conditions  (CH  1)-(CH  2)  are  satisfied.  Then 
the  homogeneous  part  of  the  Kalman  filter  matched  to  any  Markov  chain 
state  sequence  is  uniformly  asymptotically  stable,  uniformly  in  time  and 
in  all  possible  state  sequences. 

Proof : The  homogeneous  part  of  the  Kalman  filter  matched  to  I^(k)  is 
given  by 


x.(k|k)  = y (k,k-l)x  (k-l|k-l) 


(4.10) 


From  condition  (CH  2)  ani  Equation  (4.8)  it  follows  that  Yj(k,k-1)  is 
uniformly  asymptotically  stable,  uniformly  in  j and  in  k.  Therefore, 
there  exist  real  numbers  c^.c^  > ® (independent  of  k and  j)  such  that 
[30] 


||y j (k,£ )il  < CjexpC-c^k-jJ )]  V j (4.11) 

Q.E.D. 

The  condition  (CH  2)  is  crucial  to  the  proof  of  Lemma  4.1,  however, 
alternate  conditions  may  be  used  to  replace  (CH  2).  These  assumptions 
will  be  denoted  by  (CH  2.i),  i = 1,2,3. 
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Assumption  (CH  2.1):  Assume  that 

Px(k|k-l;j|j)cVLC  * 0 Vk 

Furthermore,  the  pair  (A,C)  is  observable. 

Together  with  condition  (CH  1),  condition  (CH  2.1)  can  then  be  used  to 
prove  Lemma  4.1.  To  show  tnis,  from  [28,  Chap.  7,  Equation  (7B.1)],  we  have 

I-K.(k)Cy  = [i  + y?  p(k|k-l;j|j )CTr~1C] _1  (4.12) 

J Kj  x 

Therefore,  from  Equation  (4.8)  it  follows  that 

\\f  (k,k-l)j|  < ||C  I + Y l Px(k|k-l;j|j)CTR'1C]'1||.i|A||+  (4.13) 

J j 

Now  condition  (CH  2.1)  yields 

IlC1  +vl  Px(k!k-l;j|j)cV1C]"1j|  < ||I||  = 1 (4.14) 

j 

Since  A is  asymptotically  stable,  there  exist  real  numbers  c^,c2  > ® 
(independent  of  k and  j)  such  that  [30] 

jj  A||  k < c1exp[-c2k]  , ks  0 (4.15) 

From  Equations  (4.9)  and  (4. 13 )- (4 . 15 ) , we  have 

||Yj(k,X)||  < ||a|| k”^  < c1exp[-c2(k-A )]  , V j (4.16) 


In  this  thesis,  we  take  ||‘A||  = max[\(A  A)]  . 

\ 
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T - 1 

The  requirement  that  the  matrix  P (k|k-l;j|j)C  R C be  positive  semi- 

x 

definite  is  more  restrictive  than  (CH  2),  but  is  easier  to  \erify.  It 

is  obviously  satisfied  for  a scalar  dynamical  system.  Also  consider  the 

case  where  all  the  state  variables  are  observed  separately  and 

independently,  i.e.,  let  C be  a diagonal  matrix.  Then  (CH  2.1)  holds. 

Another  example  where  (CH  2.1)  holds  is  the  case  of  scalar  observations 

with  C as  a row  vector  with  all  entries  equal  to  one.  In  general,  the 

T “1 

matrix  ( k | k-1 ; j | j )C  R C is  positive  semi-definite  if  the  symmetric 

T -1 

positive  semi-definite  matrices  P (k|k-l:j|j)  and  C R C commute.  From  a 

result  in  Bellman  [8,  p.  56],  it  follows  that  if  two  positive  semi- 

definite  matrices  G and  H commute,  then  there  exists  an  orthogonal  matrix 

which  will  simultaneously  diagonalize  G and  H.  Let  W denote  such  an 

orthogonal  matrix.  Then  W^W  = I,  and  WTGW  and  WTHW  are  diagonal  matrices. 

T T 

Furthermore,  since  G,H  £ 0,  it  follows  that  W GW  and  W HW  have  only 

T T T 

nonnegative  diagonal  elemerts;  hence,  GH  = W(W  GW)(W  HW)W  is  positive 
semi-definite . 

Assumption  (CH  2.2):  The  pair  (A,C)  is  observable  and  the  state  equation 

(1.1)  is  controllable.  Furthermore,  MAil  < \ ■ /X  „„ 

' H min  max 

where 

Xmin  “ inflX(px(klk_1^|J>)|  V j and  k 
X 

Xmax  2 sup|X(px(k|  k-l.;  jX  j ) ) | V j and  k 

X 

Now  we  shall  prove  Lemma  4.1  using  Assumption  (CH  2.2). 


76 


Let  I (k)  denote  the  Markov  chain  state  sequence  with  all  entries  equal 
J1 

to  1,  i.e.,  for  which  * IV  k,  Then  we  have 


Px(k|k-l;j1|j1)  ^ Px(k|k-l;j|j)  Vj 


(4.17) 


j = 1,2 


2k+l 


since  the  filter  matched  to 


(k)  makes  use  of  all  the 

1 


observations.  Now  from  the  controllability  and  observability  assumptions 
and  Lemma  7.2  in  [28]  (see  also  [14]  and  [25]),  it  follows  that  there 
exist  a > 0 and  integer  N > 0 such  that 


Px(k|k-l;j1|j1)  * a I Vk*N  (4.18) 

Furthermore,  by  using  condition  (CH  1)  we  can  show  that  (as  in  Lemma  3.2) 
there  exists  8 > 0 such  that 

Px(k|k-l;j|j)  < pi  V j and  k (4.19) 

From  (4.17)  and  (4.18),  <*<  inf|x(P  (k| k-1 ; j | j ) ) | and  from  (4.19), 

X x 

8 2:  sup|x(P  (k| k-1; j | j))|  for  all  j and  k.  Now  Equation  (4.8)  may  be 
X x 
rewritten  as 

¥j(fc,k-l)  - Px(k| k; j j j )P"1(k| k-1; j | j )A 

where  we  used  Equation  (7.105)  in  Chapter  7 of  [28].  Furthermore,  we 
have 

Px(k|  k;  j | j ) = I] P ~ 1 ( k | k-1;  j | j)  + cVVf1 
< Px(k|k-l;j| j)  since  R_1  > 0 . 


■MMX 


LI.  | 
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Hence 

||y.(k,k-l)||  < (|Px(k!  k-1;  j ) j)||x||p”L(k|  k-1;  j | j)Hx||A|| 

£ X.»K  ’‘“I" 

< 1,  by  assumption. 

Therefore  there  exist  c^,c2  > 0 such  that 

||y j (k,/ )||  < ciexp[-c2(k-£)]  , V j. 

The  existence  of  Xmax  and  Xm^n  has  already  been  proved.  For  example,  take 

X =8  and  X . “a.  An  implication  of  the  condition  (CH  2.2)  is  that 

max  min 

the  eigenvalues  of  A should  be  sufficiently  "close"  to  the  origin. 

Assumption  (CH  2.3):  In  model  (1.1)-(1.2),  Case  2,  BQBT  > 0.  Also,  the 
observation  matrix  C is  of  full  rank. 

The  assumption  (CH  2.3)  implies  that  there  exist  real  numbers  and  8^ 
such  that  the  conditions  given  by  (4.20)  and  (4.21)  hold: 

BQBT  i>  aLI,  > 0 (4.20) 

CTR_1C  2 P 1I,  ?1  > 0 (4.21) 

Furthermore,  there  also  exist  real  numbers  and  $2  such  that  (4.22) 
and  (4.23)  hold: 

BQBT<a2I,  <*2  < » (4.22) 

CTR_1C  < S2I,  82  < » (4.23) 

+ For  a symmetric  matrix  D,  j| Djj  = max|X(D)|. 

X 


r 
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The  conditions  (4.20)-(4.23)  are  much  stronger  than  the  usual 

controllability  and  observability  assumptions.  These  conditions  may  now 

be  used  to  prove  the  uniform  asymptotic  stability  of  Yj(k,£),  given  by 

Equation  (4.9),  for  all  j,  k,  £.  To  this  end,  note  that,  from  Equation 

(4.8),  whenever  y,  = 0,  y (k,k-l)  = A where  A is  stable  by  condition 
j ^ 

(CH  1).  So  we  need  to  consider  only  those  components  of  Yj(k,£)  in  (4.9) 
for  which  y.  = 1,  k<  i < £.  Now  we  may  use  the  Lyapunov  function 
technique,  as  in  the  proof  of  Lensna  3.5,  to  prove  the  asymptotic  stability 
of  Y j (k,£ ) . We  make  an  extensive  use  of  the  conditions  (4. 20)- (4. 23 ) ; 
details  are  omitted. 

Finally,  it  is  conjectured  that  the  stability  of  matrix  A should  be 
sufficient  to  ensure  uniform  stability  of  the  Kalman  filters  matched  to 
all  possible  state  sequences.  A proof  is  not  yet  available. 

Lemma  4.2:  There  exist  real  numbers  p > ® such  that 

8LI  < Pz(k|k-l;j|j)  < p2I  V j,k;  j = 1,2,. ..,2 


k+1 


where 


and 


p2(k|k-l;j| j)  = E{[zk-2(k|k-l;j)][2k-z(k|k-l;j)]T!lj(k)} 


z(k| k-1 ; j ) - E[zk|Zk_1,IJ(k)} 
- *kj C 
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Proof : We  have 

P (k|k-l;j|j)  = y*  C Px(k|k-l;j|j)CT  + R (4.24) 

j 

SR  V k,j  ,i 

s since  R > 0 by  assumption. 

From  the  condition  (CH  1),  it  follows  that  there  exists  a positive  number 
M < • such  that 

l|Px(k|k-l;j|j)||  < M V J,k  (4.25) 

(For  example,  take  Xj(k|k)  = Xq  = x^(0|0)  V k,j;  a suboptimal  estimator.) 
Hence,  there  exists  a real  number  3^  > 0 such  that 

Pz(k| k-1; j | j)  < p2l  V j,k 

Q.E.D. 

Lemma  4.3:  Given  the  dynamical  system  (1.1)-(1.2),  the  observations  z^ 
and  z^  become  independent  as  |k-n|  — » for  all  k,n  S 0. 

Sketch  of  Proof:  It  follows  from  condition  (CH  1),  normality  of  x^  and 
ergodicity  of  the  Markov  chain  corresponding  to  the  true  transition 
matrix  n . 

Q.E.D. 

Lemma  4.4:  Let  the  dynamical  system  (1.1)— (1.2)  begin  at  initial  time 
kQ  - (instead  of  k^  * 0).  Then,  for  k 2;  > -®,  the  observation 

sequence  £ z is  stationary. 
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Sketch  of  Proof:  It  follows  from  conditions  (CH  1)  and  (CH  3). 


Q.E.D. 


Lemma  4.5:  (i)  Let  the  dynamical  system  (1.1)-(1.2)  begin  at  initial 


time  kg  = £.  Then 


a-e-  \ 

where  f ( • ! * »tt)  is  the  probability  density  function  of  z,  given 

2klZk-l,X  k 

Z,  , = fz  } assuming  that  the  transition  matrix  it  is  true; 

k-l,i  n n*i 

k k 

r»,  . = fa.},  . denotes  the  value  taken  by  the  random  sequence  [z.l.  . 

*k,i  1 i=t  J " iJi-i 

(ii)  Let  the  dynamical  system  (1.1)-(1.2)  begin  at  initial  time  k^  = 

Then  the  random  sequence  [f^  (ZjJ Zk-1 ,n^ k-0  becomes  asymptotically 

k'  k- 1 

stationarv,  where  Z,  . = fz  ,0<  n<  k-l}. 

* k-l  11  n 

Proof : It  parallels  entirely  the  proof  of  Lemma  3.6. 

Lemma  4.6:  The  random  variable  i n f (z,  |Z,_  , ,tt)  is 

Wl,—  * 

integrable,  k =•  0,1, , where  Z,  ■ [z  ,-<*><  i < k}  . 

tC  j *•  1 

Proof : It  is  the  same  as  the  proof  of  Lemma  3.7. 

Lemma  4,7:  Define 

yk  *>  £n  f (Z[c)zk.1.TT)  - E[in  f (zj  ,tt)} 


Lemma  4.6:  The  random  variable  in  f 


where 


Zk  - [z^O  < i < k}.  Then 

Pr[n  1|y1  + y2  + ...  + yn|  > e]  - 0 as  n - 


for  any  e > 0. 


Proof ; From  Lemmas  4.3  and  4.5,  it  follows  that  E(y^ykJ  - 0 as 
j k— i j — as.  Then  use  the  result  in  Feller  [20,  p.  240]  and  asymptotic 
stationarity  of  the  sequence  {y^.O  < k < ®]  to  obtain  the  desired  result. 

Q.E.D. 


■MMMwK  mm 
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Now  we  are  ready  to  prove  Theorem  4.1. 

Proof  of  Theorem  4.1:  It  consists  of  satisfying  the  6 conditions  of  Tse 

and  Anton  [63],  discussed  in  the  proof  of  Theorem  3.1  in  chapter  3. 

Conditions  (AC1)  through  (AC4)  are  satisfied,  as  in  the  proof  of 

Theorem  3.1,  using  the  results  of  Lemmas  4.1  through  4.7.  To  show  that 

the  condition  (AC5)  is  satisfied,  we  note  that,  for  rr  ^ nt>  we  have 

lim  tt11  t lim  (tt^)0,  consequently,  the  observation  process  {z^}  cor- 
n — ® n -»  ® 

responding  to  these  it's  become,  in  the  limit,  two  stationary  processes 

with  different  statistics  (see  Lemma  4.4).  Then 

lim  f (zn|Zn_^,TT)  / lim  f(z^ ) Z^_ ^ ,nt ) over  a set  of  nonzero  probability, 
n — ® n -*  ob 

Combining  this  with  Lemma  4.5,  it  follows  that  condition  (AC5)  is  satisfied 

Then  condition  (AC6)  can  be  shown  to  hold  as  in  the  proof  of  Theorem  3.1. 
This  proves  Theorem  4.1. 

Q.E.D . 

Remark  4.1:  It  concerns  condition  (CH  4).  The  comments  made  in  Remark 
3.7  are  applicable  here  too. 

By  using  Theorem  4.1,  we  can  easily  establish  convergence  results 
similar  to  those  given  in  Theorems  3.2,  3.3  and  3.4.  We  omit  the  details 
to  avoid  unnecessary  repetitions.  Finally,  it  should  be  noted  that  the 
scheme  outlined  in  Section  4.2  is  optimum  in  the  Bayes  sense  at  every 
step  k and  the  optimality,  therefore,  is  independent  of  convergence. 
Nevertheless,  it  is  of  interest  to  know  the  conditions  under  which  the 
solution  converges  to  one  that  uses  the  correct  transition  probabilities, 
since  it  makes  the  asymptotic  solution  independent  of  the  priors  P(tt  ). 


— , — r- 
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4.4.  Discussion 

The  asymptotic  behavior  of  a Bayes  optimal  adaptive  estimation  scheme 
for  a linear,  discrete-time  system  with  Markov  interrupted  observations 
was  investigated,  where  the  transition  probability  matrix  of  the  Markov 
chain  governing  the  interruption  process  was  assumed  to  be  unknown.  The 
main  objective  was  to  provide  theoretical  justification  for  some  simulation 
results  in  [52].  We  examined  the  Bayes  optimal  estimation  scheme  whereas 
in  [52]  simulation  results  were  given  using  a suboptimal  algorithm. 

Assuming  that  the  suboptimal  algorithm  is  a "good"  approximation  to  the 
optimal  scheme,  our  convergence  results  should  then  carry  over  to  the 
suboptimal  scheme  in  some  sense. 

The  convergence  results  given  in  this  paper  can  easily  be  extended 
to  a slightly  more  general  dynamical  model  where  the  measurement  Equation 
(1.2)  is  replaced  by 

Zk  " YkC*k  + d-Vlc)v^1)  + v<2) 

where  [vj^],  i = 1,2,  are  two  independent,  zero-mean,  white  Gaussian 
noise  sequences  with  constant  covariances  (see  e.g.  [37]). 

Finally,  comments  made  in  Section  3.7  apply  here  too. 


CHAPTER  5 


ESTIMATION  FOR  SYSTEMS  WITH  UNKNOWN, 
TIME -INVARIANT  NOISE  COVARIANCES 


5.1.  Introduction 

So  far,  in  the  last  few  chapters,  we  have  dealt  only  with  a particular 
type  of  parameterization/modelling  of  the  switching  characteristics  of 
a linear  discrete-time  system.  Specifically,  we  assumed  that  a finite- 
state  Markov  chain  describes  the  switchings  in  the  noise  characteristics. 
Such  a probabilistic  description  may  not  always  be  appropriate.  In  this 
chapter,  and  also  the  next,  we  investigate  a detection-estimation  scheme 
for  state  estimation  in  uncertain  dynamical  systems  with  unknown  noise 
covariances.  Information  regarding  the  unknown  parameters  is  assumed 
to  be  available  in  the  form  of  multiple  bounds  on  the  values  of  the 
parameters  and  their  time-derivatives . In  this  chapter  attention  is  con- 
fined to  constant  noise  covariances;  the  time-varying  case  is  treated  in 
the  next  chapter. 

The  proposed  scheme  is  based,  to  an  extent,  on  the  approach  given  in 
[46] • It  is  a heuristic  extension  of  the  standard  minimax  scheme  to  the 
case  when  multiple  bounds  (disjoint  or  nested)  on  the  unknown  parameters 
are  available.  To  this  end,  a weighted  conditional  mean-squared  error 
cost  structure  is  formulated.  An  ad-hoc  solution  which  draws  upon  the 
properties  of  the  standard  minimax  solution  to  the  single  bounding  set 
case,  is  proposed.  The  proposed  approach  is  an  attempt  to  alleviate  the 
pessimistic  performance  of  the  standard  minimax  estimator  for  large 
observation  records  and  large  uncertainties,  while  retaining  its  desirable 
small-sample  properties.  A Bayesian  approach  [34],  [36]  would  require 
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specification  of  the  a priori  probabilities  of  occurrence  of  all  possible 
parametric  values.  If  the  "true"  a priori  probabilities  are  "much  different" 
from  the  assumed  probabilities,  small-sample  performance  will  suffer.  The 
small-sample  performance  of  a given  scheme  is  particularly  important  for 
time -varying  parameter  case  when  not  much  is  known  regarding  the  time- 
variation  of  the  parameter.  The  given  scheme  is  expected  to  be  useful 
mainly  in  the  time-varying  parameter  case;  the  present  work  limited  to 
constant  parameters  is  intended  as  an  illustration  of  the  concepts  involved. 

The  proposed  scheme  is  formulated  in  Section  5.2.  A weighted  condi- 
tional mean-square  error  cost  structure  is  proposed  in  Section  5.2,  to 
which  an  approximate  solution  is  given  in  Section  5.3.  In  Section  5-4  the 
asynptotic  behavior  of  the  solution  is  investigated.  In  Section  5.5  an 
example  is  considered  to  illustrate  the  results. 

5.2.  Proposed  Scheme 

We  consider  Case  3 of  model  (1.1)-(1.2).  Recall  that  the  given  model 
is  specified  up  to  a set  of  time-invariant  unknown  parameters  denoted  by 
the  vector  9 of  dimension  q.  This  uncertainty  may  be  in  constant  noise 
covariances  Q or  R only,  which  are  assumed  to  be  continuous  functions  of  9. 
The  unknown  parameter  vector  9 is  assumed  to  satisfy  at  least  one  of  the 
following  conditions 

9 € i = 1,2  , . . . ,N 

where  ^ , i = 1,2,...,N  is  a collection  of  compact  subsets  oflR^. 

It  is  desired  to  estimate  the  value  of  x^  based  on  the  observations 

Zk  * Czj’  0 " J - k}* 

a 

Let  x(k)  denote  an  estimate  of  x,  based  on  the  observation  Z,  . The 

k k 

conditional  mean-squared  error  (CMSE)  of  this  estimate  given  the  observa- 
tion Z^_  and  the  unknown  parameter  9 is  given  by 
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L[x(k),Zk>0]  = Efi'x^  - x(k)  2 | , 0 } . 


It  is  required  that  the  selected  estimator  minimize 

max  L[x(k),Z  ,9],  i=l,2,...,N  (5.1) 

9 € Cl  * 

This  is  a multiple  objective  optimization  problem.  It  is  generally  not 
possible  to  minimize  (5.1)  with  respect  to  x(k)  for  all  i;  the  minimiza- 
tion of  the  maximum  error  for  values  of  the  parameter  in  different  regions 
might  be  conflicting.  So  we  need  a scalar  cost  functional.  The  optimal 
estimator  is  defined  as  the  one  that  minimizes  a weighted  sum  of  (modified) 
maximum  errors  given  as 
N 

Z max  {\!  L[x(k),Z  ,9  ]]  (5.2) 

i=l  9.€  Q.  1 k 1 

l i 


where 


*■[  - 

J=1 


V)* 


(5.3) 


and 

N 

Z \ = 1,  \ > 0,  1=1,2,. . . ,N 

i-1  1 


Here  9 implies  9 € . In  the  Expression  (5.2)  the  maximizing  8^  is 

selected  while  keeping  9 , j#i,  j=l,2,...,N,  fixed.  (Note  that  L ] 
is  also  a function  of  0 ^ , j/i , j=l,2, — ,N,  through  x(k).)  A consistent 
solution  to  (5.2)  for  maximizing  9^,  i=l,2,...,N  may  not  exist.  We 
assume  that  either  it  does,  or  an  order  of  precedence  among  0 1 s,  for 
maximization,  is  specified. 


f(Z,  1 9 . ) denotes  the  density  function  of  Z,  given  9..  In  Section  5.4, 
k i k i 

to  avoid  confusion,  we  also  use  the  notation  ^(^>9^)  for  f(Z^|9^). 
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The  weighting  parameters  measure  the  relative  importance  attached 
to  errors  in  each  region  . The  fixed  parameters  are  design  parameters 
that  measure  the  a priori  imporatnce  assigned  to  the  maximum  errors  in 
each  region.  Ihe  other  part  of  V is  observation  dependent.  Use  of  the 
relative  conditional  probability  of  occurrence  of  the  given  observation 
in  the  weighting  parameters  ensures  that  the  optimum  estimator  does  not 
minimize  the  maximum  error  in  a particular  region  at  the  expense  of 
too  large  an  increase  in  the  maximum  CMSE  in  other  regions,  when  the 
relative  probability  of  occurrence  of  the  given  observation  due  to  9 in  the 
former  region  is  small. 

The  scalar  cost  functional  (5.2)  can  be  viewed  as  a generalization 
of  the  performance  criterion  of  Magill  [36]  . If  the  domain  of  9 is  finely 
quantized  such  that  the  ' s are  disjoint  sets  each  with  a single  element 
corresponding  to  a possible  value  of  9,  and  \ represents  the  a priori 
probability  that  9 € CT  , then  the  given  scalar  cost  functional  reduces  to 
the  performance  criterion  of  Magill;  because  then  V = P{9  € fh  | Z^}  . 

If  the  objective  is  to  minimize  the  mean-squared  error  given  9 in 
some  minimax  sense,  the  proposed  criterion  is  clearly  more  pessimistic. 

From  a game -theoretic  viewpoint,  in  the  proposed  criterion,  nature  chooses 
the  worst  possible  a posteriori  strategy  whereas  in  case  of  the  usual  CMSE 
cost  function  nature  chooses  the  worst  a priori  strategy.  In  other  words, 
nature  is  allowed  the  knowledge  of  estimator's  observation  in  addition  to 
what  she  knows  in  case  of  the  usual  CMSE  cost  functional. 

Now  the  problem  is  as  given  in  (5.4), 

N 

min  £ max  L[x(k)  ,Z,  ,0  ] } 

x(k)€s  i=l  6 1 k 1 


% 
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where  S denotes  the  space  of  all  measurable  transformations  from  the 
observation  space  Z^  into]Rn.  Exact  solution  to  (5.4)  appears  to  be 
intractable.  An  ad-hoc  solution  is  investigated.  The  structure  of  the 
estimator  is  assumed  to  be  of  the  form  given  by  (5.5). 

N 

x(k)  = 2 X'x. (k)  (5.5) 

i=l 

where  x^  (k)  denotes  the  MMSE  estimator  matohed  to  0^  ^ and  based  on 
observation  Z^.  Note  that  we  are  also  assuming  that  9^'s  in  (5-5)  are  the 
same  as  the  worst  case  parameter  values  in^'s.  This  structure  is  similar 
to  the  one  resulting  from  a Bayesian  viewpoint  [34],  [36].  Also,  when 
there  is  just  one  bounding  set  (i.e.,  N = 1),  we  obtain  the  standard 
minimax  solution.  Now,  this  amounts  to  solving  for  a maximin  solution  to 
(5.4).  For  this  to  be  a valid  solution,  we  must  show  that  problem  (5.4) 
when  viewed  as  a game,  has  a value  and  nonrandomized  strategies  exist  for 
both  x(k)  and  ©i , i = 1,2,...,N.  That  is,  the  following  must  hold, 

N N 

min  2 max  [X}L[x(k)  ,Z.  ,0 .]}  = 2 max  [X'.L[xn(k)  ,Z,  ,9  . ] 3 
x(k)€S  i-1  0t€  ^ k 1 i-1  0t6 


where 


N 

xn(k)  = arg{  min  2 X'L[x(k),Z,  ,9  ]]  . 
0 x(k)€  S i=l  1 k i 


So  far  we  have  been  unable  to  show  this.  Now,  in  general,  a maximin 
solution  is  very  pessimistic.  However,  it  is  well  known  [l7],  [47]  that 
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for  a standard  minimax  scheme  with  quadratic  cost  function,  the  game  has  a 
value  when  uncertainty  lies  only  in  noise  covariances,  Heuristically 
extending  this  approach  to  the  multiple  bounding  sets  case,  we  would  like 
to  investigate  the  performance  of  the  estimator  given  by  (5.5). 

The  problem  now  is  that  of  finding  0,  to  maximize 


Li  " xiEClixk  ‘ *(k)i|2|zk,e.} 


(5.6) 


X[E£i|xk  - xi(^)|!2|Zk,ei}  + \j||x(k)  - *.(k)||2  . (5.7) 


Substituting  (5.5)  in  (5.7)  and  noting  that  E X'  * 1,  we  have 

i-1 


-i  * X[EC||xk  - x1(k)||2|Zk,ei}  + \'||  E X ! (*j (k) 

J^i 


- ft1(k))||d 


(5.8) 


It  can  be  shown  [28,  Chap.  7]  that  for  linear  models 


E[|ixk  - xi(k)||2|zk,el}  = E[j|xk  - xi(k)||2|ei} 


(5.9) 


Therefore,  when  9^,  i * 1,2,...,N  are  known,  can  be  generated  sequentially 
as  a function  of  time  using  N Kalman  filters  one  each  matched  to  0^. 

Now  we  have  to  find 

arg{  max  L , i - 1,2 N . 

9i  € ni 

In  general  no  closed-form  analytical  solution  exists.  In  the  next  section 


a general  procedure  is  given  to  find  a numerical  solution. 


. — - v.  ■ * 
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5.3.  Approximate  Solution 

As  mentioned  in  the  last  section  no  closed-form  analytical  solution 
exists  hence  an  approximate  numerical  solution  is  proposed.  It  is 
obtained  by  parameterizing  the  in  (5.8)  by  approximating  the  con- 
tinuous parameter  space  with  a finite  set  of  quantized  points.  LetQ^  be 
quantized  into  * [0^,  l ■ l,2,...,m^J,  i « 1,2,...,N.  Then  we  have  a 
number  of  Kalman  filters  one  each  matched  to  0^,  & * 1,2,..., m^;  i“l,2,...,N. 
This  gives  us  x.  (k)  for  each  possible  i,l , where  the  subscript  it,  on  x denotes 
a MMSE  estimator  matched  to  parameter  0...  As  a byproduct  we  obtain 
E[||xk  - xu(k)!i2|zk,9u]  and  the  probabilities  for  a11  1 and 

All  these  computations  can  be  performed  recursively  (see,  for  example, 

[57]).  The  Kalman  filters  are  run  in  parallel  as  in  [34],  [36]. 

Now  can  be  found  for  all  possible  0^'s.  For  every  k find 


0 =■  arg[  max  L },  i = 1,2, 

9i^i 


,N 


(5.10) 


Note  that  here  we  are  solving  N coupled  expressions.  Substituting  these 
910,  i - 1,2, ... ,N  in  (5.5)  gives  the  required  estimator  x(k). 

Note  that  the  memory  requirements  are  constant  with  time.  However, 
the  number  of  quantizations  required  may  be  large,  particularly  if  the 
dimension  of  the  unknown  parameter  0 is  large.  This  tends  to  make  the 
proposed  scheme  computationally  involved  for  large  dimensional  0.  Note 
that  the  computational  complexity  of  the  given  scheme  is  of  the  same 
order  as  that  of  some  of  the  adaptive  schemes  [34],  [36].  Further,  such 


I I — I -- 
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a parallel  filtering  structure  has  also  been  proposed  in  various  other 
schemes  [35],  [54],  [58].  The  problem  at  hand  is  basically  a nonlinear 
estimation  problem.  As  is  generally  the  case,  a "global"  solution  [58] 
to  such  a problem  is  computationally  complex. 

5.4.  Asymptotic  Behavior 

In  this  section  we  show  that,  under  certain  conditions,  the  solution 
obtained  in  Section  5.2  converges  to  a MMSE  estimate  matched  to  a value 
of  the  unknown  parameter  9 which  belongs  to  the  same  bounding  set  £1^  as 
does  the  true  parameter  value  9Q.  The  results  are  given  only  for  the  case 
where  9 may  belong  to  one  of  two  bounding  sets  and  fl2.  Extension  to 
larger  number  of  sets  1 2 3 is  straightforward;  hence,  it  is  omitted. 

Equation  (5.8)  has  the  following  form  for  the  case  of  2 bounding 


sets : 


L 


1 


L 


2 


where 


X{  E[||xk  - x1(k>|| 2 1 91J  + X{(X2)2||x2(k)  - ^ 
X2  E[||xk  - x2(k)||2|92}  + X2(X{)2||x2(k)  ' *1 
Xj  - X1f(Zk|91)/(X1f(Zk|91)  +X2f(Zk|e2)) 

x2  " A.  2^  I (X^f(Zk|9^)  + X2f(Zk|©2)) 


(toll2 

(5.11) 

ooll2 

(5.12) 

(5.13) 

(5.14) 

Xi  + X2  - + \2  - 1 
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The  estimate  in  Equation  (5.5)  becomes 

x(k)  - X'  xx(k)  + x2(k)  (5.15) 

Let  Xq ( k ) denote  the  optimal  solution  obtained  by  the  approximate 
algorithm. 

First  we  need  the  following  two  lemmas. 

Lemma  5.1:  Let  Ln(9)  * In  f(Zn;9).  Suppose  the  system  (1.1)-(1.2)  is 
asymptotically  stable  and  Kalman  filter  matched  to  any  9 € 0 ■ U fig 
is  asymptotically  stable.  Then 

1 

lim  — L (0)  ■ L (9)  a.e.  P.  uniformly  in  0 € n, 
n n ® B 

n -»  ® o 

where 

La9(0)  - lim  E{ln  £ (*J Zn_i ;0 ) | eQ} 
n — « 


and  where  the  expectation  is  taken  over  the  random  variables  z , 
j * 1,2,...,  assuming  that  they  obey  (1.1)-(1.2)  with  0 = 0q,  the  true 
parameter  value. 

Proof;  See  Kashyap  [32].  (See  also  [63]  and  [15].) 

Lemma  5.2:  Suppose  that  conditions  of  Lemma  5.1  are  satisfied.  Let 


• L##(0),  p - 1,2, ...  ,q,  be  continuous  for  9€f)  fl  ^ U ^ 2 » where 

P th 

0 denotes  the  p component  of  0.  Assume  that  S (0)  * 0,  p - l,2,...,q, 


Sp(0) 


has  a unique  solution  9 = 9^  and  that  9q  maximizes  La#(9),  where  0q  is  the 
true  parameter  value.  Then  L (9)  is  a monotone  increasing  function  of  9 


for  0 < 9_  and  a monotone  decreasing  function  of  0 for  0 2 9«_. 

p Op  p p Op 


t Z 

P is  a probability  measure  induced  on  the  range  space  of  the  random 
9o  . -i 

sequence  (Zq.z^Zj,  . . . } indexed  by  the  true  parameter  value  0Q. 

. - 
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Proof : It  follows  trivially  from  the  assumptions. 

Uniqueness  of  the  root  of  Sp(0)  at  0 - 0^  is  a consequence  of  the 

uniqueness  and  consistency  of  the  maximum  likelihood  estimate  of  0 [15,32]. 

Now  we  present  the  convergence  results. 

Theorem  5.1.  Let  and  Cl^  be  disjoint  bounding  sets  where 

■ ^®11  — ® ^2  * t®21  ^ ® — ®22^  ’ ® 12  ®21  and  where  the  vector 

2 

inequality  holds  componentwise.  Suppose  Efllxn  “ 5c(n >|!  1 9 } is  a monotone 
increasing  function  of  0 for  all  n.  Suppose  the  conditions  of  Lemmas  5.1 
and  5.2  are  satisfied.  Let  0^6  where  0q  is  the  true  parameter  value. 
Then  the  following  holds : 

(i)  if  f(zn;92i>  < f^Zn»®n^  for  n 2 some  n0’  then  lim  *0^  ls  3 

n — ® 

estimate  matched  to  0^. 

(ii)  If  f (zn»®21^  S ^(zn’®ll^  ^°r  n a some  nQ»  then  in  lim  Xg(n) 

n -♦  ® 

either  ©2  - 021>  0X  * ©12>  or  02  = 0£,  - 0j^  (see  Equation  (5.15)) 

where  0 ' , i = 1,2,  are  such  that  L (0,' ) * L (01).  In  the  former  case 
1 ® 1 ® z 

lim  xq(o)  is  a MMSE  estimate  matched  to  0^. 
n -•  ® 

Proof:  From  Lemma  5.1,  given  6 > 0 3 N(6 ) such  that 
l“L  (0)  - LJQ)\  <6/2  a.e.  P* 

o 

for  n a N(6)  uniformly  in  0 € fl . Therefore,  we  have 

’2+WSnWS2+W'  i - 1.2,;  01  € C11  . 


w 


Hence  it  follows  that 
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W - L.<V  - 6 s i tw  - W’ 


< L/9^  - Lm(Q2)  + 6 


which  in  turn  results  in 


exp[n{L##(ei)  - Lw(02)  - 6)]  < f (Z^g^/f <zn;©2) 

< expfnfL^Oj^)  - 1^(0 2 ) + 5}]  . 

Now,  since  L (9 ) is  a real  number,  one  of  the  following  relations  must 

1 09 

be  true : 

L,^)  > W’  L«(0l>  " L-(02)  °r  W < W* 

Suppose  that  1^(9  ^)  > L—(9 2 ) * T^en  S e > 0 3 1^(9  ^)  * 1^(9^  + e* 


Choose  6 < e.  Then 


f<W 

f(zn;e2) 


2 exp[n(e  -6)]  for  n 2 N(5) 


— oa  as  n — 


Similarly  for  1^(9,)  < L,,^).  we  have  lLm  Cf (Z^Sj^/f (Zn;02^ 

n -•  ao 

(For  L (9,)  ■ L (9_)  we  cannot  follow  this  approach.) 

• 1 00  Z 

(i)  Since  f(zn;021)  < f(Zn;9n)  for  n 2 no, 


n 11 


lim  [f(Zn;91)/f(Zn;92)]  - • V 9^  02- 


Since  E{||x  - x(n)||  1 9 ] is  a monotone  increasing  function  of  9,  lim 

n n - ® 

(see  Equation  (5.11))  is  maximized  for  9^  ■ 912*  Hence  lim  xQ(n)  is  a 

n -*  ® 

MMSE  estimate  matched  to  9^2> 
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(ii)  Now  f (Zn;02i>  s f(Zn;611)  for  n a some  nQ.  For  0Q  6 0^,  in 

lim  xrt(n),  0,,  0O  such  that  L (0,)  < L (0O)  is  not  possible  since  we  can 
0 1 Z ® 1 ® L 

n 

always  choose  some  0^  "closer"  to  0^  in  order  to  maximize  in  (5.11) 

for  a fixed  0^.  Therefore,  in  lim  Xg(n),  0^  and  are  such  that  either 

n -»  ® 

(9 1 ) > ( 0 2 ) or  1*^(0  j.)  = *n  the  ^ormer  case,  as  in  part  (i) 

of  this  theorem,  lim  Xg(n)  is  a MMSE  estimate  matched  to  0^- 
n — ® 

Q • E . D • 

Theorem  5.2:  Let  the  hypotheses  of  Theorem  5.1  hold  except  that  now 
©0?  ^2’  Then  the  following  holds: 

(i)  If  f (Z  ;0O-)  > f (Z  ;01O)  for  n £ some  n,,,  then  lim  xn(n)  is  a MMSE 

n n iz  u u 

n -♦  ® 

estimate  matched  to  ©22* 

(ii)  If  f (Z  ;0_„)  < f(Z  ;0.„)  for  n 2 some  n , then  in  lim  x,,(n),  0 

n zz  n iz  u u 1 

n - ® 

and  0O  are  such  that  L (0, ) < L (0O).  Further,  these  0,  and  0„  are  such 
Z ® 1 « Z L Z 

that  0q  - 0^  > 0 and  - 0Q  > 0 where  0 is  a q-vector  with  all  zero 
entries . 

Proof : The  proof  is  similar  to  that  given  for  Theorem  5.1;  therefore, 
it  is  omitted. 

An  interpretation  of  these  theorems  is  that  the  optimal  estimate 
converges  to  one  of  the  following  two  estimates:  a)  An  estimate  matched 
(optimal  in  the  MMSE)  to  a value  of  the  parameter  in  the  same  region  as 
the  true  value  independent  of  X^s,  i = 1,2.  b)  A weighted  average  of 
two  estimates  one  in  each  region,  with  the  performance  (CMSE)  of  the  one 
in  the  wrong  region  at  least  as  good  as  the  performance  of  the  one  in 
the  correct  region. 


i 


B 
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2 

An  example  of  the  case  where  E[||x^  - x(n)||  |g}  is  a monotone 
increasing  function  of  0 for  all  n is  the  problem  where  uncertainty  lies 
only  in  the  diagonal  elements  of  matrices  Q and/or  R. 

5.5.  Example 

Consider  the  following  scalar  dynamic  system: 


x 


k+1 


Q.9  xk  + wk 


Zk  “ Xk  + Vk’  k * O.1.2.--- 

with  E{xq}  * 1.0,  E{xq]  = 3.0,  E[vk]  * R = 0.2,  E{wk}  = Q.  The  unknown 
parameter  is  assumed  to  be  Q and  it  belongs  to  either  or  ^2  where 
* [0.05,0.2]  and  = [0.2, 1.0].  The  bounding  sets  are  quantized  as 
follows:  = [0.1, 0.2}  and  Oj  = [0. 2,0.4, 0.6 , 0.8 , 1. 0] . Thus  6 Kalman 

filters  are  run  in  parallel.  A computer  simulation  was  carried  out  using 
random  number  generators.  In  Figures  5.1  and  5.3  plots  for  r.m.s.  errors 
versus  true  parameter  values  are  given  for  stages  k = 3,7,15  respectively, 
averaged  over  100  Monte  Carlo  runs.  In  each  of  these  figures,  plots  for 
r.m.s.  errors  due  to  (i)  the  proposed  scheme  (ii)  the  minimax  scheme 
(iii)  the  adaptive  scheme  due  to  Magill  [36]  and  (iv)  optimal  Kalman 
filtering  assuming  knowledge  of  the  true  parameter  values,  are  given. 

In  the  proposed  scheme  we  choose  X^  * X2  m 0*5.  In  the  adaptive  scheme 
it  is  assumed  that  Q is  equiprobable  within  each  n^,  i ■ 1,2  and  that 
the  probability  Q 0 equals  X^»  i “ 1,2. 
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Figure  5.1.  RMS  error  in  state  estimate  vs.  unknown  parameter  for  k=3. 


RMS  Error 


F igure 


.2.  RMS  error  in  state  estimate  vs.  unknown  parameter  for  k=7. 


From  these  figures,  we  note  the  following.  Iu  comparison  with  the 
minimax  scheme , the  proposed  scheme  decreases  the  maximum  MSE  (mean- 
squared  error)  in  region  0^  at  the  cost  of  some  increment  in  the  maximum 
MSE  in  This  increment  in  flj*  however,  decreases  with  time  whereas 

improvement  inQ^  gets  better.  Further,  comparison  with  the  adaptive 
(Bayesian)  scheme  shows  that  the  small-sample  performance  (see  Fig.  5.1) 
of  the  proposed  scheme  is  superior.  Also,  adaptive  features  of  the 
proposed  scheme  become  obvious  from  Figs.  5.2  and  5.3.  Thus  this 
example  tends  to  support  the  objective  of  the  proposed  approach — to  reduce 
the  pessimism  of  the  standard  minimax  scheme  for  large  observation  record 
without  affecting  the  small-sample  performance. 

5.6.  Discussion 

A method  of  state  estimation  in  uncertain  linear  dynamical  systems 
with  noise  covariances  unknown  has  been  presented.  It  is  a heuristic 
extension  of  the  standard  minimax  scheme  to  the  case  when  multiple  bounds 
on  the  unknown  parameters  are  available.  The  estimator  is  shown  to 
converge  under  certain  conditions  to  a MMSE  estimator  matched  to  a value 
of  the  unknown  parameter  which  lies  in  the  same  bounding  set  as  does 
the  true  parameter  value.  An  example  of  a scalar  dynamical  system  is 
given  which  tends  to  support  the  usefulness  of  the  proposed  approach  in 
reducing  the  pessimism  of  the  standard  minimax  scheme  for  large  observa- 
tion record  and  large  uncertainties,  while  retaining  the  desirable 


small-sample  performance. 
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The  proposed  scheme  differs  from  Padilla's  approach  [45,46]  in  that 
we  consider  a weighted  conditional  mean-square  error  cost  function  given 
the  past  observations  rather  than  simply  a weighted  mean-square  error 
cost  function.  Extension  of  Padilla's  approach  to  dynamical  systems 
seems  to  be  extremely  cumbersome  computationally.  Furthermore,  it  is 
not  at  all  clear  as  to  how  Padilla's  approach  could  be  extended  to  the 
time-varying  parameter  case.  In  the  next  chapter  we  consider  extension 
of  the  proposed  scheme  to  the  case  of  time-varying  uncertainties. 
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CHAPTER  6 

ESTIMATION  FOR  SYSTEMS  WITH  UNKNOWN, 
TIME-VARYING  NOISE  STATISTICS 


6.1.  Introduction 

In  the  last  chapter  we  investigated  a detection-estimation  scheme 
for  state  estimation  in  linear  dynamical  systems  with  unknown,  time- 
invariant  noise  covariances.  In  this  chapter  the  scheme  is  extended  to 
systems  with  unknown,  time-varying  noise  covariances.  Past  work  on  this 
type  of  problem  has  been  quite  ad-hoc.  It  consists  mainly  of  heuristic 
extensions  of  the  schemes  optimal  for  the  time -invariant  uncertainty  case. 
For  example,  in  Bar-Shalom  et.al.  [7],  a fading  memory  filter  is  proposed 
to  track  the  time  variations  in  the  unknown  noise  covariances.  A similar 
approach  may  also  be  found  in  Alspach  [3],  The  "degree"  of  fading  is 
taken  to  be  a design  parameter,  and  the  scheme  requires  on-line  "tuning". 
Such  schemes  appear  to  be  able  to  track  slow  variations  in  unknown  para- 
meters quite  well,  but  may  run  into  serious  difficulties  if  the  unknown 
parameters  vary  rapidly,  precluding  enough  time  for  the  filter  to  adjust 
to  the  changes.  In  some  other  approaches,  such  as  [11]  and  [9],  it  is 
assumed  that  a state  space  model,  driven  by  white  noise,  of  the 
nonstationary  covariance  parameters  is  available.  In  most  of  the 
applications  this  may  turn  out  to  be  quite  restrictive. 
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The  proposed  approach  too  is  based  on  heuristic  considerations; 
however,  considerable  attention  is  paid  to  small-sample  performance  also, 
so  as  to  enable  the  estimator  to  handle  both  fast  as  well  as  slow 
variations  in  the  unknown  parameters.  The  proposed  scheme  is  discussed 
in  Section  6.2.  In  Section  6.3,  an  example  is  considered  to  illustrate 
the  performance  of  the  scheme. 

6.2.  Proposed  Scheme 

The  system  model  under  consideration  is  the  same  as  in  Section  5.3, 
namely.  Case  3 of  model  (1.1)-(1.2),  except  that  now  the  uncertainties 
are  allowed  to  be  time-varying.  Assume  that  the  given  model  is 
specified  up  to  a set  of  unknown  parameters,  possibly  time-varying, 
denoted  by  the  q-vector  9(k)  at  time  k.  As  in  Chapter  5 this  uncertainty 
may  be  in  noise  covariances  Qk  or  (at  time  instant  k)  only,  which  are 
assumed  to  be  continuous  functions  of  9(k).  The  unknown  parameter  vector 
9(k)  is  assumed  to  satisfy  at  least  one  of  the  following  conditions: 

9 (k)  (n.,  i =»  1,2 N 

where  n^.,  i *=  1,2,...,N  is  a collection  of  compact  subsets  of  R<'.  An 
example  of  such  conditions  is  given  below: 

q£+1  = (0(k):  a < 9 (k)  < b,f  1 0 ( k-t-1 ) -9 ( k) | < d^,f  V k} 
i = 1,2;  N = 2;  *QjX  ...  xO^,  k+1  times 

where  a,  b,  d^  are  time -invariant  q-vectors;  d^  < d^  and  0 denotes  a 
q-vector  with  all  zero  entries;  and  0(k)  ^ [9(j),  0 < j < k}  denotes  a 


^The  inequality  is  satisfied  componentwise. 
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parameter  sequence.  The  overall  region  in  which  9(k)  lies  is  known. 
Within  that  set  it  varies  either  slowly  (|9(k+l)-0(k)|  < d^)  or  can 
also  change  rapidly  ( l9(k+l)-9  (k)  ( < d_2).  Another  example  is, 

n£+1  - [®  (k) : a < 9(k)  < b^  V k] 
i - 1,2;  N - 2;  b < b 2 

In  this  case  there  is  no  restriction  on  the  rate  of  change  of  9(k). 
Depending  upon  the  problem  at  hand,  several  variations  of  the  C^'s  are 
possible.  Also,  notice  that  in  these  examples  we  have  used  nested  sets, 
that  is , 

The  objective  is  to  obtain  an  estimate  of  the  value  of  x^  based  on 
observations  » [z^,0<  j < k] . 

We  follow  the  approach  already  discussed  in  Chapter  5 except  that, 
at  stage  k,  instead  of  time- invariant  9,  we  use  parameter  sequence  ®(k). 
As  in  Chapter  5,  let  x(k)  denote  an  estimate  of  x^  based  on  the  observa- 
tion Zfc.  The  CMSE  of  this  estimate  given  the  observation  sequence  Zfc 
and  the  parameter  sequence  ®(k)  is  given  by 

L[x(k),Zk,®(k)]  - E[||xk  - x(k)||2|zk,©(k)}  (6.1) 

The  optimal  estimator  is  defined  as  the  one  that  minimizes  a weighted  sum 
of  maximum  errors  given  as 

N 

E max  [XI  L[x(k),Zk,0  (k)]]  (6.2) 

i-i  0(k)  eni 


where 
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xi  " x1f(zk!®1(k))/C  2 xjf(zk|ej(k))]  (6 

N 

E ■ 1,  Xi  > 0,  i = 1,2,. ...N 
1-1  1 

f(Zk|®i(k))  - conditional  density  function  of  given 

the  parameter  sequence  ®^(k) 
k+1 

and  ®^(k)  implies  that  0(k)  € . The  remarks  made  in  Section  5.2 

apply  here  too.  Now  the  problem  is  as  given  in  (6.4), 


(6.3) 


min  Z max  [x!L[*(k),Z  ® (k)]} 

/ 1.  \ n j i a / 1 _ \ /-  ^ i 1 1 I*  1 


x(k)  € S i-1  ®i(k)  6 


(6.4) 


where  S denotes  the  space  of  all  measurable  transformations  from  the 
observation  space  Z^  into  Rn.  Since  an  exact  solution  to  problem  (6.4) 
appears  to  be  intractable,  an  ad-hoc  solution,  given  by  (6.5),  is 
investigated  as  in  Section  5.2. 


ft(k)  - Z X’x  (k) 
i-1  1 1 


(6.5) 


where  x^k)  denotes  the  MMSE  estimator  matched  to  the  sequence  ®^(k) 
and  based  on  data  Z^.  The  efficacy  of  such  a solution  for  the  constant 
parameter  case  was  demonstrated  in  Chapter  5. 

The  problem  now  becomes  that  of  finding  ®^(k)  to  maximize 


Li  “ Xi£fl|xk  - i(k)||2|zk,®i(k)] 


(6.6) 


« \^E[llxk  - x1(k)!|2|Zk,®i(k)}  + X jj|x(k)  - xi(k)||2  (6.7) 

- X|EC||xk  - xi(k)||2|©t(k)}  + X[||x(k)  - 3ci(k)||2  (6.8) 

When  ®^(k),  i ■ 1,2,...,N  are  known,  can  be  generated  sequentially  in 
time  using  N Kalman  filters  one  each  matched  to  ®^(k). 

Approximate  Solution 

In  general  no  closed-form  analytical  solution  for  maximizing  ®^(k)'s 

exists.  So  we  have  to  resort  to  a numerical  solution.  First  the  continuous 

parameter  space  is  approximated  with  a finite  set  of  quantized  points. 

Let  0(k)  € be  quantized  into  m^  discrete  values  ©^(k),  j - l,2,...,m^; 

k+1 

i ■ 1,2,...,N.  Then,  at  stage  k,  we  have  (at  most)  nu  possible  sequences 
k+1 

®i(k)  infl.^  , i * 1,2 N;  k - 0,1,2,...  , ignoring  other  possible 

constraints  such  as  bounds  on  rate  of  change  of  0(k),  etc.  Let  ®.  (k) 

i£ 

k+1 

denote  a specific  sequence  inO^  at  stage  k.  Then  we  have  a number  of 

k+1 

Kalman  filters  one  each  matched  to®i^(k),  i - 1,2, ...,m^  ; i - 1,2,. ,.,N. 

This  gives  us  x^(k)  for  each  possible  i and  l,  where  the  subscript  it 
on  x(k)  denotes  an  MMSE  estimator  matched  to  the  sequence  ®^(k).  As  a 
byproduct  we  obtain  E[||xk  - *^(k)||  ^^©^(k)]  and  c^e  probability 
f(Zk|®i^(k))  for  all  i and  l.  All  these  computations  can  be  performed 
recursively  (see,  for  example,  [57]). 
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Now  can  be  found  for  all  possible  ©^(kys.  For  every  k,  find 

0 (k)  * arg[  max  . .L i - 1,2,. ...N  (6.9) 

10  ©.(k)€Q£+li 

Note  that  here  we  are  solving  N coupled  expressions.  Substituting  these 
0io(k),  i - 1,2,...,N  in  (6.5)  gives  the  required  estimator  x(k). 

Reduction  in  Computational  Complexity 

The  computation  and  storage  requirements  of  the  proposed  approximate 
solution  increase  exponentially  with  time.  We  shall  now  discuss  some 
further  approximations  which  will  be  used  to  control  the  number  of  Kalman 
filters  to  be  a finite,  or  below  a maximum  allowable,  number.  The  procedure 
resembles  somewhat  the  detection-estimation  approach  discussed  in  Chapter  2. 
Instead  of  considering  all  the  probability  densities  f (Z^l©^ (k) ) , 
i * 1,2,..., N,  at  stage  k,  corresponding  to  all  possible  parameter  sequences 
®ij(k),  we  shall  disregard  some  of  the  "unlikely"  sequences,  i.e.,  not 
process  the  Kalman  filters  matched  to  these  unlikely  sequences.  Also, 
if  the  densities  corresponding  to  two  distinct  sequences  are  "close"  to 
each  other  in  certain  distance  measure,  only  one  of  them  need  be 
considered. 

To  develop  an  editing  criterion  to  limit  the  number  of  "modal" 
filters,  we  need  a distance  measure.  A measure  of  distance  between  two 
probability  densities  f^(x)  and  f^(x)  is  the  Bhattacharyya  coefficient 
(B-coeff.)  pij  given  by 
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and  the  B-distance  is  given  by  -£np  ^ . Note  that  0<  p ^ < 1 and  for 
f^(x)  ■ fj(x),  * 1.  The  reasons  for  choosing  B-coef f icient  as  distance 

measure  are  that  (i)  it  is  closely  related  to  the  performance  bounds  in 
detection  and  estimation  theory  [29]  and  (ii)  it  is  easy  to  compute  for 
Gaussian  densities. 

In  order  to  establish  criteria  according  to  which  we  make  decisions 
to  drop  a parameter  sequence  we  need  a threshold  a where  0 < a < 1.  Now 
two  parameter  sequences  ©^(k)  and  ®^(k)  will  be  considered  distinguishable 
iff  the  B-coef f icient  between  the  densities  f(ZjJ©^(k))  is  less  than  a. 

Then  we  have  the  following  algorithm  to  find  the  "best"  state  estimate. 

It  is  given,  for  simplicity,  only  for  the  case  of  two  (nested)  bounding 
sets  Ql,  i ■ 1,2;  n^c  Extension  to  larger  number  of  sets  is 

straightforward . 

Algorithm 

Let 

M ■ Maximum  number  of  Kalman  filters  to  be  processed  at  any  time  k. 

* Maximum  number  of  Kalman  filters  matched  to  parameter  sequences 
k+1 

in  at  any  time  k. 

M2  ■ Maximum  number  of  Kalman  filters  matched  to  parameter  sequences 
k+1 

in  ^2  at  any  time 
■ M since  c d 2 

Step  1.  Compute  the  CMSE  , i ■ 1,2,  for  all  admissible  parameter 

sequences  © ^ ^ ( k)  at  time  k where  denotes  the  CMSE  L^,  given  by  Equation 

(6.8),  computed  for  a specific  sequence  ©^(k).  Find  the  pair  of  sequences 

k+1 

(©10(k),9,n(k)),  ©Jrt(k)  , i - 1,2,  which  maximizes  L< , i - 1,2, 


20 


'10 


given  by  Equation  (6.9). 
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k+1 

Step  2.  Now  select  the  sequence  ® „(k)  £ Q ^ , Q„(k)  t which 

gives  the  largest  closest  to  L^q  where  is  computed  using  sequence 
®io(k).  Check  if  < a where  p^Q  is  the  B-coeff icient  between  densities 
f(Zk|©ij(k))  and  f (Zk|®i(j(k)),  i = 1,2.  If  true,  then  this  parameter 
sequence  is  selected  as  the  sequence  corresponding  to  the  "second"  Kalman 


filter  of  the  maximum  allowed  number  for  set  Q . 


k+1 


If  false,  then 


select  the  ©^(k)  which  gives  the  next  largest  CMSE  and  repeat  the 

procedure.  The  process  is  continued  until  either  the  number  of  selected 

k+1 

sequences  equal  M^,  i * 1,2,  for  regions  0^  , i = 1,2,  respectively,  or 

all  admissible  sequences  have  been  considered,  whichever  occurs  earlier. 

i k+1 

Let  Mi  denote  the  number  of  sequences  selected  in  region  , i = 1,2. 

Step  3.  Set  k = k+1  and  go  to  Step  2,  with  the  exception  that  now  ®^(k+l) 

are  all  possible  "extensions"  of  0^Ck)>  where  l = 1,2,...,M*;  M*  < 1L, 

k+2 

i * 1,2,  consistent  with  the  specifications  of  the  set  n..  , i = 1,2. 

Remark  6.1.  The  integers  , i = 1,2,  are  design  parameters,  usually 
dictated  by  the  computation  and  storage  capabilities  of  the  processor, 
and  accuracy  desired.  Also,  the  threshold  a is  selected  empirically. 

Remark  6.2.  In  Step  2,  all  distances  are  measured  from  the  selected 
"optimum"  pair  (®1Q(k) ,®2Q(k) ) . 


Remark  6.3.  The  scheme  given  above  includes  the  one-step-at-a-time 
approach  in  which  only  two  Kalman  filters  corresponding  to  ©^(k), 
i *»  1,2,  are  carried  forward  to  stage  k+1. 

In  the  next  section  we  consider  an  illustrative  numerical  example. 
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6.3.  Example 

To  illustrate  the  feasibility  of  the  scheme,  a simple  system  was 
simulated.  The  dynamics  of  the  system  is  represented  by  the  scalar  state 
equation 

xk+l  * °-98  xk  + Wk 
and  the  observation  equation  is 

Zk  * Xk  + Vk’  k ” • • 

The  initial  state  is  x^  ~ N(10,100),  whereas  for  simulation  the  observation 
samples  were  obtained  using  Xg  » 1.  Furthermore,  v^  ~ N(0,0.5)  and 
w^  ~ N(0,Q^)  V k.  The  driving  noise  covariance  is  assumed  to  be  unknown. 
It  is  further  assumed  that  is  time-varying  and  is  known  to  belong  to 
either  orflj,  where  n 1 = (Qk=Qk  * 1.2,4}  and  n2  - CQk;Qk  * 1,2,4,6,8,10}. 
The  bounding  sets  , i * 1,2,  may  be  thought  of  as  resulting  from  quantiza- 
tion of  the  continuous  sets  [Qk;Qk  S 4.0}  and  {Qk:Qk  ^ 10. 0} , respectively. 
There  is  no  restriction  on  the  rate  of  change  of  Q^. 

The  system  was  simulated  using  random  number  generators,  and  filtering 
was  carried  through  59  stages.  The  "true"  driving  noise  covariance 
was  chosen  as  given  below: 

Qk  * 1.0  for  0 < k < 9 

3.0  for  10  < k<  29 

8.0  for  30  < k < 44 

1.0  for  45  < k < 59 

Figure  6.1  shows  the  true  covariance  vs.  time.  The  performances  of 
various  schemes  (described  below)  were  computed  by  averaging  over  100 
Monte  Carlo  runs.  The  same  sequence  of  true  noise  covariance  was 
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used  for  all  the  runs.  The  proposed  detection-estimation  approach  was 

implemented  using  ^ = 0*5,  = 5 and  M2  = 10.  Therefore,  we 

k k 

carry  forward  at  most  5 Kalman  filters  in  0^  and  10  Kalman  filters  in 

Furthermore,  at  any  time  k,  we  consider  at  most  15  extensions  of  M^ 

k+1  k+1 

sequences  in  O^  and  60  extensions  of  M2  sequences  inf^  • The 

B-distance  feature  was  not  incorporated.  In  Fig.  6.2,  plots  for 

"instantaneous"  r.m.s.  error  in  state  estimate  vs.  time  are  given  for  a 

typical  observation  sample  sequence  for  the  two  approaches : proposed 

detection-estimation  scheme,  and  optimal  Kalman  filter  using  the  true 

covariance  sequence  {Q^} . 

In  Fig.  6.3,  plots  for  average  r.m.s.  error  in  state  estimate  vs. 
time  are  shown,  averaged  over  100  runs,  for  the  following  schemes:  (i) 
proposed  detection-estimation  scheme,  (ii)  optimal  Kalman  filter  using 
true  covariance  sequence  {Q^},  (ill)  fixed  parameter  Kalman  filter  with 
fixed  covariance  value  given  by  = 8.0  V k.  It  is  clear  from  Fig.  6.3 
that  the  proposed  scheme  is  capable  of  "tracking"  the  system  state  very 
well  for  relatively  high  values  of  Q^;  see  the  performance  over 
10  < k < 44.  For  low  values  of  (i.e.,  = 1.0),  the  state  estimate 

due  to  the  detection-estimation  scheme  has  relatively  high  r.m.s.  error 
compared  to  that  due  to  optimal  Kalman  filter  with  true  covariance 
sequence.  However,  the  detection-estimation  scheme  is  much  better  than 
the  fixed  parameter  Kalman  filter  with  covariance  = 


8.0  V k. 


R.M.S.  Error 
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In  Table  6.1,  the  combined  ensemble-  and  time-averages  of  r.m.s. 
errors  due  to  various  schemes  are  given.  The  r.m.s.  error  in  the  state 
estimate  was  first  averaged  over  100  Monte  Carlo  runs,  then  it  was 
averaged  over  59  stages.  Let  Dt  denote  the  combined  ensemble-  and  time- 
average  of  r.m.s.  error  in  the  state  estimate,  and  let  Dg(k)  denote  the 
ensemble  average  (over  100  Monte  Carlo  runs)  of  r.m.s.  error  at  time  k. 
Then,  we  have 

1 59 

Dt  = 59  kE=1Vk)  * 

The  average  r.m.s.  error  Dt  was  computed  for  the  following  schemes:  (i) 
proposed  detection-estimation  scheme,  (ii)  optimal  Kalman  filter,  (iii) 
fixed  parameter  Kalman  filters  with  = 10.0,  8.0  and  1.0,  respectively, 

V k.  Table  6.1  gives  the  average  r.m.s.  error  Dfc.  The  fixed  parameter 
Kalman  filter  with  = 10.0  Y k is  the  minimax  filter  for  the  given 
problem.  The  improvement  over  the  fixed  parameter  filters,  due  to  the 
proposed  scheme,  is  quite  obvious  from  Table  6.1. 

6.4.  Discussion 

The  detection-estimation  scheme  presented  in  Chapter  5 has  been 
extended  to  linear  dynamical  systems  with  unknown,  time-varying  noise 
covariances.  An  approximate  solution  is  developed  to  reduce  the  computa- 
tional complexity  of  the  proposed  scheme.  A numerical  example  of  a scalar 
dynamical  system  is  considered  to  illustrate  the  feasibility  of  the  scheme. 
The  motivations  for  considering  the  proposed  approach  are  the  same  as  in 
Chapter  5,  namely,  reduction  in  the  pessimism  of  the  standard  minimax 
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Table  6.1 


Comparison  of  the  average 
various  schemes. 

r.m.s.  error  due  to 

Scheme 

Average  r.m.s.  error  Dt 

Optimal  Kalman 

0.6229 

Detection-Estimation 

0.6621 

Fixed  Parameter 

Kalman  Filter  with 

Qk  * 10.0 

0.6765 

Qk  = 8.0 

0.6728 

1.0 


0.6933 


116 


scheme  for  large  uncertainties  and  large  observation  record  while  retaining 
its  desirable  small-sample  performance.  A good  small-sample  performance 
should  enable  the  proposed  scheme  to  "track"  well  rapid  variations  in  the 
noise  covariances,  whereas  the  schemes  relying  mainly  on  good  asymptotic 
performance  do  not  have  enough  time  to  adjust  to  the  rapid  changes. 

We  have  not  compared  the  proposed  scheme  to  any  other  approach  to 
state  estimation  for  systems  with  unknown,  time-varying  noise  covariances, 
such  as  given  in  [3]  and  [7].  The  main  reason  for  this  is  that  these 
approaches  need  parameters  which  can  only  be  found  empirically  by  "tuning" 
the  filters  on-line.  Therefore,  any  implementation  of  these  schemes  needs 
» extensive  simulations  to  select  suitable  values  for  such  parameters. 

Moreover,  the  values  of  these  parameters  depend  upon  the  rate  of  change  of 
unknown,  time-varying  uncertainties.  Such  a comparison  is,  nevertheless, 
of  considerable  interest  and  is  left  for  future  research.  Finally,  we  note 
that  more  extensive  simulations  are  needed  to  judge  the  suitability  of  the 
proposed  detection-estimation  approach  to  state  estimation  for  systems 
with  time-varying  covariances. 
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CHAPTER  7 

SUMMARY  AND  CONCLUSIONS 

In  this  thesis  the  problem  of  state  estimation  for  a class  of  linear 
discrete-time  dynamical  systems  with  unknown  time-varying  parameters  has 
been  studied.  The  basic  objective  was  to  model  the  time-variations  of  the 
unknown  parameters  in  an  appropriate  fashion  to  render  the  problem 
analytically  and  computationally  tractable,  while,  at  the  same  time,  to 
employ  models  general  enough  to  describe  a large  class  of  time-varying 
characteristics  of  the  system  parameters.  Another  objective  was  to  develop 
estimation  algorithms  for  the  chosen  models  and  to  investigate  their 
properties,  specifically,  to  examine  their  asymptotic  behavior.  Attention 
was  focused  mainly  on  systems  with  unknown  time-varying  noise  statistics. 
Two  different  approaches  to  modeling  and  estimation  under  time-varying 
uncertainties  were  investigated.  In  one  of  the  approaches,  the  jump 
parameters  were  assigned  a probabilistic  description.  In  the  other 
approach,  multiple  bounds  on  the  unknown  parameter  values  and  its  time 
derivatives  were  assumed  to  be  available. 

In  Chapters  2 through  4 we  used  a finite  state  Markov  chain  model  for 
the  jump  parameters.  The  parameters  were  assumed  to  take  values  only  from 
a finite  set,  with  transitions  from  one  value  to  another  determined  by  a 
Markov  transition  probability  matrix.  In  Chapter  2 the  transition  matrix 
was  assumed  to  be  known.  A Bayesian  approach  was  adopted  for  MMSE  state 
estimation  and  the  optimal  estimator  was  found  to  require  exponentially 
increasing  computation  and  storage  capabilities  with  time.  A new 


suboptimal  approach,  employing  a detection-estimation  scheme,  was  proposed 
in  Chapter  2 to  alleviate  the  excessive  computational  requirements  of  the 
optimal  state  estimator.  Two  simulation  examples  were  presented  which 
indicated  the  superiority  of  the  proposed  detection-estimation  approach 
over  some  of  the  existing  suboptimal  algorithms.  The  approaches  discussed 
in  Chapter  2 are  applicable  to  general  dynamical  models  with  jump  para- 
meters. However,  in  Chapter  3,  we  confined  our  attention  to  switchings 
in  the  noise  statistics  only. 

In  Chapter  3,  an  adaptive  Bayesian  MMSE  estimation  scheme  was 
investigated,  both  analytically  and  through  computer  simulation.  The 
Markov  transition  probability  matrix  was  assumed  to  be  unknown  and  to 
belong  to  a finite  set  which  also  contained  the  true  transition  matrix. 

The  objectives  in  this  chapter  were  to  present  feasible  adaptive  algorithms 
that  sequentially  produced  an  approximate  MMSE  estimator,  and  to  investigate 
the  asymptotic  behavior  of  the  Bayes  optimal  adaptive  estimation  scheme. 
First  the  optimal  and  the  suboptimal  approaches  discussed  in  Chapter  2 
were  extended  to  the  case  of  unknown  transition  matrix.  Then  conditions 
were  investigated  under  which  the  constrained  maximum  likelihood  estimate 
of  the  transition  matrix  converged  in  probability  to  the  correct  value 
of  the  transition  matrix.  The  convergence  of  the  a posteriori  probabilities 
of  the  transition  matrices  given  the  past  observations,  and  the  convergence 
of  the  performance  of  the  optimal  adaptive  scheme  were  also  investigated. 
Finally  a simulation  example  was  presented  to  illustrate  the  convergence 


results. 
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In  Chapter  4 we  dealt  with  dynamical  systems  with  interrupted 
observations.  The  interrupted  observation  mechanism  was  expressed 
in  terms  of  a stationary  two-state  Markov  chain  whose  transition 
probability  matrix  was  assumed  to  be  unknown  and  to  belong  to  a finite 
set.  The  asymptotic  behavior  of  the  Bayes  optimal  adaptive  estimation 
scheme  was  investigated  analytically  as  in  Chapter  3.  Since  extensive 
simulation  results  for  this  case  are  available  in  the  literature,  none 
were  presented. 

An  alternate  model  for  the  unknown,  time-varying  parameters  was 
considered  in  Chapters  5 and  6.  Attention  was  confined  to  systems  with 
unknown  noise  covariances.  It  was  assumed  that  multiple  bounds  on  the 
unknown  covariances  and  their  time  derivatives  were  available.  In 
Chapter  5 we  dealt  only  with  constant,  unknown  covariances;  the  primary 
purpose  was  to  illustrate  the  concepts  involved.  A detection-estimation 
approach  was  proposed  for  state  estimation  and  its  asymptotic  behavior 
was  analyzed.  The  primary  objective  of  the  proposed  approach  was  to 
reduce  the  pessimism  of  the  standard  minimax  estimator  for  large 
observation  records  and  large  uncertainties,  while  retaining  its  desirable 
small-sample  properties.  A simulation  example  was  presented  which 
demonstrated  some  of  the  advantages  of  the  proposed  scheme  over  the 
standard  minimax  scheme  and  the  Bayes  optimal  MMSE  estimation  scheme. 

The  approach  of  Chapter  5 was  extended  to  the  case  of  unknown  time- 
varying  noise  covariances  in  Chapter  6.  The  extension  was  found  to  require 
excessive  computational  capabilities,  therefore,  an  approximation  to  the 
proposed  scheme  was  discussed.  The  feasibility  of  the  approximation  was 
demonstrated  through  a simulation  example. 
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Finally,  we  indicate  some  research  problems  which  should  follow  the 
work  presented  in  this  thesis.  In  Chapters  3 and  4 we  assumed  that  the 
unknown  transition  probability  matrix  could  only  take  one  of  a finite 
number  of  values.  Extension  of  the  convergence  results  of  Chapters  3 and 
4 to  the  case  when  the  unknown  transition  matrix  lies  in  a compact  set 
should  be  investigated.  Then  one  can  use  the  standard  methods  of 
parameter  optimization  like  gradient  method,  conjugate  gradient  method, 
etc.,  to  find  the  maximum  likelihood  estimate,  which  obviates  the  need 
for  computing  the  joint  probability  density  of  the  observations  for  each 
and  every  candidate  value  of  the  transition  matrix,  resulting  in  a 
considerable  saving  in  computations.  A tacit  assumption  in  the  use  of 
discrete-time  system  models  in  this  thesis  is  that  the  jumps  in  the 
unknown  parameters  occur  exactly  at  the  sampling  time  instants.  Use  of 
continuous -time  systems  should  therefore  result  in  better  models. 
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APPENDIX  A 

PSEUDO-BAYES  APPROXIMATION  TO  STATE  ESTIMATION 
IN  SWITCHING  ENVIRONMENTS 

In  this  appendix  the  pseudo-Bayes  approximation  to  the  optimal  MMSE 

estimator  of  Section  2.2  is  described.  This  approximation,  due  to 

Ackerson  and  Fu  [1],  is  based  on  approximating  Equations  (2.4)  and  (2.5). 

It  is  assumed  that  f(x^|z^)  is  normally  distributed  with  mean  x(k|k) 

k+1 

and  covariance  P(k),  whereas,  in  truth,  it  is  a sum  of  S separate 
Gaussian  distributions.  By  making  this  assumption,  (2.5)  is  expressed  as 


P(ik=i|zk)  = 


f(zk|ik=i,Zk-l)P(ik=i|zk-l) 

^1f(zk|ik=i'Zk-l)P(ik=£lzk-l) 


(Al) 


The  value  of  P(i^=i|z^_^)  is  available  from  P(i^_ ^=1 [ Zk_^) » ^=1,2,..,S 


and  transition  matrix  n.  Since 

f(xjzk)  ~ N(x(k|k)  ,P(k)) 


(A2) 


we  have 


f (z  | ik=i,Zk_1)  ~ N{C(Ax(k-l|k-l)  +Bti.(l))  + v(i)  ,CMi(k)CT  + R(l)}  (A3) 


Furthermore  (2.4)  may  be  written  as 


£(k|k)  = Z P(i,-C|zu)x,(k|k) 

i = l 


(A4) 


where  x^(k|k)  is  calculated  from  (2 . 9 ) - (2 . 12 ) by  using  x(k-l|k-l)  as  the 

previous  estimate.  The  new  covariance  matrix  P(k)  is  defined  as 
S 

P(k)  = Z P(i  «*|z  )P,(k)  . (A5) 

1=1  * K 
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where  P^(k)  is  calculated  as  in  (2.12).  Notice  that,  unlike  the  optimal 
estimator  case,  now,  given  sequence  Ij(k),  x^(k|k)  depends  on  rr.  The 
memory  requirements  are  thus  reduced  to  a constant  independent  of  the 
number  of  steps  the  process  runs. 


r 
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APPENDIX  B 

RANDOM  SAMPLING  APPROACH  TO  STATE  ESTIMATION 
IN  SWITCHING  ENVIRONMENTS 

In  this  appendix  the  random  sampling  approximation  to  the  optimal 

MMSE  estimator  of  Section  2.2  is  described.  It  is  due  to  Akashi  and 

Kumamoto  [2]*  and  is  based  on  a Monte  Carlo  method.  The  space  of  the 

k+1 

Markov  chain  state  sequences  Ij(k),  j = 1,2,...,  S , is  regarded  as  a 
population  and  the  state  estimate  is  calculated  with  a relatively  small 
number  of  sequences  sampled  at  random  from  the  population.  Equation  (2.4) 
may  be  rewritten  as 


x(k|k)  = x(k|k)  = ( Z x.(k|k)f.(I.(k)))/  Z f.  (I,(k))  (Bl) 
Ij(k)  J k J Ij(k)  K J 

where 

A ^ 

fk(Ij(k)>  = [f(zjt|l(X),Zji_l)P(i_e|ije_1>]  (B2) 

Let  the  integer  N be  the  number  of  sequences  lu(k,k)  = f Iq,!^, . . . ,i^}, 
u = 1,...,  N,  sampled  by  the  procedure  of  Step  1)  given  below.  For  each 
integer  y,  1 s u s N,  let  (?q,?^, ...,?£»•• •}  be  a sequence  of  independent 
random  numbers  each  of  which  is  distributed  with  uniform  distribution 
between  0 and  1.  It  is  assumed  that  these  random  numbers  are  independent 
of  {w,  },  {v  } , x and  {i  }.  The  subsequence  0 s i S k]  is  used  to 

(C  K O iC  ■*> 

sample  the  sequence  IU(k,k)  at  the  k-th  stage.  Now  define  sampling 
probabilities  P^(i^|  I (X-l)  ;Z^) , SL  * 0,1,.., k,  satisfying  the  following 
conditions . 


In  [2]  discussion  is  confined  to  switchings  in  measurement  noise  only. 
However,  extension  of  their  approach  to  a more  general  switching  model 
is  straightforward. 


, t a*. 


. A. 
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cl)  For  each  2,  Pk(i^|l(2-1) ;Z^)  is  a function  of  1(2)  and  its 

functional  form  is  determined  either  by  or  a subsequence 

of  Z,  . 
k 

c2)  With  respect  to  the  argument  i^ ,P^(i^ | 1 (2-1) ;Z^)  is  a 

A , 1 

probability  on  the  set  Q = j.l,2,..,S}. 
c3)  Let  denote  the  Cartesian  product  of  k sets  Q.  The 
probability  on  which  is  defined  by 


Pk(i(k);zk)  = n Pk(i£li (2-1) ;zk) 

2=0 


(B3) 


differs  from  zero  whenever  fk(I(k))  differs  from  zero. 

The  minimum  variance  estimator  x(k|k),  for  a given  tt  = tt^,  is  then 
calculated  as  follows: 

Step  1:  Suppose  that  the  2-long  sequence  IU(k,2-l)  = [ik 
has  been  sampled.  Then  an  a f{l, . . ,s]  is  found  to  satisfy 

df-l  v a 

Z Pk(^llU(k^-l);Zk)  < * T Pk(ii|lU(k,2-l);Zk)  (B4) 

i2_1  i2=1 
and  the  next  state  in  the  sequence  is  chosen  such  that 


lftc,2)  = [lU(k,2-l),i”^}  = {lv(k,2-l),a} 


(B5) 


If  for  each  v,  1 s V ^ N,  Step  1)  is  performed  for  2 = 0,1,.., k,  then  the 
N sequences  IV(k,k)  = (ik  g,...,ik  k)  can  be  sampled.  In  practical  terms, 
(B4)  and  (B5)  imply  random  sampling  of  the  value  of  ik  ^ from  the  set  Q 
with  probability  ?k(i^!lV(k,2-l) ;Zk) . Therefore,  IU(k,k)  is  distributed 
with  probability  ?k(I(k);Zk)  of  (B3). 
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Step  2:  The  estimate  x(k[k)  is  approximated  by  x(k|k): 

N 

n'1  e [xu(k|k)fk(iu(k,k))/Pk(iu(k,k);zk)] 

x(k|  k)  « — (B6) 

i N 

E [f,  (IU(k,k))/P,  (Iu(k,k);Z.  )] 
u-1  k 

where  x^klk)  = E{xk|zk,IU(k,k)] . Akashi  and  Kumamoto  [2]  have  shown 
that  the  following  properties  hold: 

PI)  For  given  Zk>  the  numerator  and  the  denominator  of  (B6)  are  the 
unbiased  estimators  of  the  numerator  and  the  denominator  of 
(Bl) » respectively.  That  is, 

N 

e{n_1  E [iu(k|k)fk(IU(k,k))/Pk(ly(k,k);Zk)]|zk} 
u=l 

= E x (kfk)f.  (I.(k))  CB 7) 

I j (k)  J k J 

and 

N 

e{n_1  E [fk(IU(k,k))/Pk(Ii;(k,k);Zk)]|zk 
u=l 

P2)  Given  Zk>  IU(k,k),  v*l,..,N,  are  independent  random  variables. 
That  is,  i 

N 

P(I1(k,k),...,lN(k,k)|z  ) - n Pk(IU(k,k);Zk)  (B9) 

u=l 

1 N 

P3)  Given  Zk,  lim  N_i  E [f  (Iv(k,k))/Pk(IU(k,k) ;Zk)]  « f(Zk) 

N “•  00  v=l 

a.e.  P (I(k);Z,  ).  This  follows  from  the  law  of  large 
k k 

numbers  and  Equations  (B2),  (B8)  and  (B9). 


= E f,  (I . (k) ) (B8) 

Ij(k)  J 
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P4)  Let  J^.N)  = E{(xk-x(k|k))TQ(xk-x(k|k))|zk} 
and  J°(Zk)  = E{(xk-x(k|k))TQ(xk-x(k|k))|zk}. 

Then  J(Zk,N)  “ J°(Zk)  + 0(N_1). 

It  remains  to  specify  a procedure  to  determine  Pk(i^ | I (£-1) ;Zk) , 
£“0,1,.., k.  In  [2]  the  following  method  has  been  suggested: 


Pk(iJl(£-l);Zk)  = P(ijl(£-l),zp 

f(zJx(X>  ,Z£-l^P^£ ' ^£-1^ 

S 

Z [numerator] 


(BIO) 


It  satisfies  conditions  cl),  c2)  and  c3) . An  important  consequence  of 
such  a definition  is  that,  for  each  £,  we  have 


52(ijl(£-l);zp  " 

“ ....  “ Pk(iJl(£-l);Zk)  (Bll) 

Furthermore 

IU(k,k-l)  * IV(k-l,k-l)>  v - 1,2,..,N  (B12) 


Therefore,  the  sequence  IU(k-l,k-l)  which  is  used  to  calculate  x(k-l|k-l) 

may  again  be  used  as  the  subsequence  of  IU(k,k),  so  that  IJ(k,k)  at  the 

y 

k-th  stage  can  be  obtained  by  sampling  only  the  k-th  component  ik  k- 
The  storage  and  computation  requirements  are  thus  reduced  to  a 
constant  (N  filters)  independent  of  the  number  of  steps  the  process  runs. 
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